[Pixman] [PATCH] float-combiner.c: Change tests for x == 0.0 tests to - FLT_MIN < x < FLT_MIN

Tue Dec 18 16:13:31 PST 2012

On Mon, 17 Dec 2012 03:34:34 +0100
sandmann at cs.au.dk (Søren Sandmann) wrote:

> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> 
> > I just wonder how big is the performance cost for adding an extra
> > comparison operation. Probably much less than using -ffloat-store,
> > -fexcess-precision=standard, and -std=c99 options, but might be
> > interesting to confirm.
> 
> It's not going to matter all that much in any case since we are talking
> about floating point variants of operations that involve
> divisions. These are not used that much, and the divisions will tend to
> swamp a lot of the difference.
> 
> However, I added conjoint_over_8888_2a10 to lowlevel-blt-test and did
> some measurements:
> 
> As a baseline, current master compiled with -m32 and == 0.0f checks:
> 
>     conjoint_over_8888_2a10 =  L1:   5.62  L2:   5.67  M:  5.65 (  0.50%) HT:  5.59  VT:  5.52  R:  5.49  RT:  5.06 (  68Kops/s)
> 
> With the FLT_MIN checks:
> 
>     conjoint_over_8888_2a10 =  L1:   5.68  L2:   5.73  M:  5.72 (  0.51%) HT:  5.65  VT:  5.53  R:  5.45  RT:  5.02 (  67Kops/s)
> 
> The numbers are actually slightly better with the checks, so I suspect
> the difference is just noise (although conceivably, the checks may
> filter out more divisions than before).
> 
> When just pixman-combine-float.c is compiled with -ffloat-store:
> 
>     conjoint_over_8888_2a10 =  L1:   5.58  L2:   5.60  M:  5.60 (  0.50%) HT:  5.53  VT:  5.44  R:  5.41  RT:  4.99 (  67Kops/s)
> 
> The numbers here are slightly worse than the baseline, but possibly
> still just noise.
> 
> If all of pixman is compiled with -ffloat-store:
> 
>     conjoint_over_8888_2a10 =  L1:   4.31  L2:   4.34c  M:  4.31 (  0.38%) HT:  4.26  VT:  4.21  R:  4.14  RT:  3.92 (  53Kops/s)
> 
> the numbers are clearly worse.
> 
> Finally, the numbers in x86_64 mode. Current master:
> 
>     conjoint_over_8888_2a10 =  L1:  19.09  L2:  19.58  M: 19.13 (  1.75%) HT: 17.47  VT: 17.35  R: 17.32  RT: 13.72 ( 178Kops/s)
> 
> With FLT_MIN checks:
> 
>     conjoint_over_8888_2a10 =  L1:  19.09  L2:  19.59  M: 19.51 (  1.76%) HT: 17.52  VT: 17.02  R: 17.00  RT: 13.43 ( 175Kops/s)
> 
> Ie., no real difference.

Agreed, now it looks clear. Thanks for the detailed benchmark results.

-- 
Best regards,
Siarhei Siamashka