Siarhei Siamashka siarhei.siamashka at gmail.com
Fri Feb 19 03:57:34 PST 2010

```Hello,

Some of the cairo-perf traces show that a significant share of CPU cycles is
used in radial gradients processing. I have sent a patch/rfc in another e-mail
with some changes to this code which should fix one of the performance
bottlenecks in it on ARM (I'm not sure about any potential side effects,
so a review is definitely needed).

Also I have a feeling that double precision is a bit excessive there, and
this functionality can be implemented with single precision floating point
calculations. On ARM Cortex-A8 double precision math is very slow,
taking ~10 cycles or more for each floating point operation, due to the
use of non-pipelined VFP unit. On the other hand, using NEON unit, it is
capable of performing up to 2 single precision floating point operations
per cycle.

> 	while (buffer < end)
> 	{
> 	    {
> 		pixman_fixed_48_16_t t;
> 		double det = B * B + A4 * (pdx * pdx + pdy * pdy - r1sq);
> 		if (det <= 0.)
> 		    t = (pixman_fixed_48_16_t) (B * invA);
> 		else if (invert)
> 		    t = (pixman_fixed_48_16_t) ((B + sqrt (det)) * invA);
> 		else
> 		    t = (pixman_fixed_48_16_t) ((B - sqrt (det)) * invA);
>
> 		*buffer = _pixman_gradient_walker_pixel (&walker, t);
> 	    }
> 	    ++buffer;
>
> 	    pdx += cx;
> 	    pdy += cy;
> 	    B += cB;

Adding small increments to the values at the end of loop iteration could be
the biggest source of precision loss. Replacing this with explicit calculation
like 'pdx = pdx0 + cx * n' should improve precision and maybe allow to use
floats freely. And floats work better with SIMD on any platforms.

> 	}

In any case, seems like radial gradients are not covered by any
correctness/precision tests at the moment. This looks like the first
thing which needs to be implemented :)

--
Best regards,
Siarhei Siamashka
```