[Pixman] Performance of radial gradients
Siarhei Siamashka
siarhei.siamashka at gmail.com
Fri Feb 19 03:57:34 PST 2010
Hello,
Some of the cairo-perf traces show that a significant share of CPU cycles is
used in radial gradients processing. I have sent a patch/rfc in another e-mail
with some changes to this code which should fix one of the performance
bottlenecks in it on ARM (I'm not sure about any potential side effects,
so a review is definitely needed).
Also I have a feeling that double precision is a bit excessive there, and
this functionality can be implemented with single precision floating point
calculations. On ARM Cortex-A8 double precision math is very slow,
taking ~10 cycles or more for each floating point operation, due to the
use of non-pipelined VFP unit. On the other hand, using NEON unit, it is
capable of performing up to 2 single precision floating point operations
per cycle.
from 'radial_gradient_get_scanline_32':
> while (buffer < end)
> {
> if (!mask || *mask++ & mask_bits)
> {
> pixman_fixed_48_16_t t;
> double det = B * B + A4 * (pdx * pdx + pdy * pdy - r1sq);
> if (det <= 0.)
> t = (pixman_fixed_48_16_t) (B * invA);
> else if (invert)
> t = (pixman_fixed_48_16_t) ((B + sqrt (det)) * invA);
> else
> t = (pixman_fixed_48_16_t) ((B - sqrt (det)) * invA);
>
> *buffer = _pixman_gradient_walker_pixel (&walker, t);
> }
> ++buffer;
>
> pdx += cx;
> pdy += cy;
> B += cB;
Adding small increments to the values at the end of loop iteration could be
the biggest source of precision loss. Replacing this with explicit calculation
like 'pdx = pdx0 + cx * n' should improve precision and maybe allow to use
floats freely. And floats work better with SIMD on any platforms.
> }
In any case, seems like radial gradients are not covered by any
correctness/precision tests at the moment. This looks like the first
thing which needs to be implemented :)
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list