[Pixman] [PATCH] float-combiner.c: Change tests for x == 0.0 tests to - FLT_MIN < x < FLT_MIN

Sun Dec 16 18:34:34 PST 2012

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> I just wonder how big is the performance cost for adding an extra
> comparison operation. Probably much less than using -ffloat-store,
> -fexcess-precision=standard, and -std=c99 options, but might be
> interesting to confirm.

It's not going to matter all that much in any case since we are talking
about floating point variants of operations that involve
divisions. These are not used that much, and the divisions will tend to
swamp a lot of the difference.

However, I added conjoint_over_8888_2a10 to lowlevel-blt-test and did
some measurements:

As a baseline, current master compiled with -m32 and == 0.0f checks:

    conjoint_over_8888_2a10 =  L1:   5.62  L2:   5.67  M:  5.65 (  0.50%) HT:  5.59  VT:  5.52  R:  5.49  RT:  5.06 (  68Kops/s)

With the FLT_MIN checks:

    conjoint_over_8888_2a10 =  L1:   5.68  L2:   5.73  M:  5.72 (  0.51%) HT:  5.65  VT:  5.53  R:  5.45  RT:  5.02 (  67Kops/s)

The numbers are actually slightly better with the checks, so I suspect
the difference is just noise (although conceivably, the checks may
filter out more divisions than before).

When just pixman-combine-float.c is compiled with -ffloat-store:

    conjoint_over_8888_2a10 =  L1:   5.58  L2:   5.60  M:  5.60 (  0.50%) HT:  5.53  VT:  5.44  R:  5.41  RT:  4.99 (  67Kops/s)

The numbers here are slightly worse than the baseline, but possibly
still just noise.

If all of pixman is compiled with -ffloat-store:

    conjoint_over_8888_2a10 =  L1:   4.31  L2:   4.34c  M:  4.31 (  0.38%) HT:  4.26  VT:  4.21  R:  4.14  RT:  3.92 (  53Kops/s)

the numbers are clearly worse.

Finally, the numbers in x86_64 mode. Current master:

    conjoint_over_8888_2a10 =  L1:  19.09  L2:  19.58  M: 19.13 (  1.75%) HT: 17.47  VT: 17.35  R: 17.32  RT: 13.72 ( 178Kops/s)

With FLT_MIN checks:

    conjoint_over_8888_2a10 =  L1:  19.09  L2:  19.59  M: 19.51 (  1.76%) HT: 17.52  VT: 17.02  R: 17.00  RT: 13.43 ( 175Kops/s)

Ie., no real difference.

Søren