[Pixman] [PATCH] float-combiner.c: Change tests for x == 0.0 tests to - FLT_MIN < x < FLT_MIN
Siarhei Siamashka
siarhei.siamashka at gmail.com
Tue Dec 18 16:13:31 PST 2012
On Mon, 17 Dec 2012 03:34:34 +0100
sandmann at cs.au.dk (Søren Sandmann) wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>
> > I just wonder how big is the performance cost for adding an extra
> > comparison operation. Probably much less than using -ffloat-store,
> > -fexcess-precision=standard, and -std=c99 options, but might be
> > interesting to confirm.
>
> It's not going to matter all that much in any case since we are talking
> about floating point variants of operations that involve
> divisions. These are not used that much, and the divisions will tend to
> swamp a lot of the difference.
>
> However, I added conjoint_over_8888_2a10 to lowlevel-blt-test and did
> some measurements:
>
> As a baseline, current master compiled with -m32 and == 0.0f checks:
>
> conjoint_over_8888_2a10 = L1: 5.62 L2: 5.67 M: 5.65 ( 0.50%) HT: 5.59 VT: 5.52 R: 5.49 RT: 5.06 ( 68Kops/s)
>
> With the FLT_MIN checks:
>
> conjoint_over_8888_2a10 = L1: 5.68 L2: 5.73 M: 5.72 ( 0.51%) HT: 5.65 VT: 5.53 R: 5.45 RT: 5.02 ( 67Kops/s)
>
> The numbers are actually slightly better with the checks, so I suspect
> the difference is just noise (although conceivably, the checks may
> filter out more divisions than before).
>
> When just pixman-combine-float.c is compiled with -ffloat-store:
>
> conjoint_over_8888_2a10 = L1: 5.58 L2: 5.60 M: 5.60 ( 0.50%) HT: 5.53 VT: 5.44 R: 5.41 RT: 4.99 ( 67Kops/s)
>
> The numbers here are slightly worse than the baseline, but possibly
> still just noise.
>
> If all of pixman is compiled with -ffloat-store:
>
> conjoint_over_8888_2a10 = L1: 4.31 L2: 4.34c M: 4.31 ( 0.38%) HT: 4.26 VT: 4.21 R: 4.14 RT: 3.92 ( 53Kops/s)
>
> the numbers are clearly worse.
>
> Finally, the numbers in x86_64 mode. Current master:
>
> conjoint_over_8888_2a10 = L1: 19.09 L2: 19.58 M: 19.13 ( 1.75%) HT: 17.47 VT: 17.35 R: 17.32 RT: 13.72 ( 178Kops/s)
>
> With FLT_MIN checks:
>
> conjoint_over_8888_2a10 = L1: 19.09 L2: 19.59 M: 19.51 ( 1.76%) HT: 17.52 VT: 17.02 R: 17.00 RT: 13.43 ( 175Kops/s)
>
> Ie., no real difference.
Agreed, now it looks clear. Thanks for the detailed benchmark results.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list