[cairo] pixman: New ARM NEON optimizations

Siarhei Siamashka siarhei.siamashka at gmail.com
Tue Feb 16 07:56:45 PST 2010


On Thursday 05 November 2009, Chris Wilson wrote:
> Before going too far along the performance, make sure you are also
> checking with Søren's work in reducing overheads.
>
> Also I'm very interested in seeing what the profiles look like for
> pixman and cairo on ARM. Just knowledge of the behaviour of your target
> applications would be useful when thinking about how to tune cairo.

Here is a result of running a standard set of cairo-perf-trace tests on 600MHz
ARM Cortex-A8 with 128MB RAM (beagleboard B7):
http://people.freedesktop.org/~siamashka/files/20100216/pixman-0.17.6/

Maybe I will use this board (otherwise it would be just collecting dust) to
run 24/7 automatically by itself for the sole purpose of tracking all the
changes to pixman git master and some selected development branches. It can
potentially run regressions tests and a bunch of various benchmarks,
presenting the results on a webpage nicely. No promises on this part though.


The problem of cairo-perf-trace is that it does not cover 16bpp desktop color
depth well which is still used on a lot of ARM devices.  Also a lot of the
standard tests run too long and are very memory hungry ('ocitysmap' test is
even practically impossible to run on this hardware due to excessive
swapping).

These particular callgraphs show that:
1. A lot of CPU time is spent in the kernel in some tests (doing heavy
swapping, IPC, or anything else). Getting callgraphs also for the kernel can
provide more details. But this can be ignored for now.
2. Not all the fast paths have NEON optimizations yet.
3. 'double -> int64_t' conversion in 'radial_gradient_get_scanline_32' 
function does not map to any ARM instruction directly and uses a very slow
call to a software emulation library (which is also implemented by doing
calls to other SW emulation functions even for the operations supported by
HW, causing even more slowdown).
4. 'image_from_pict' takes a huge amount of CPU for 'xlib' backend in some
tests.

A lot of this stuff has been already known before. Just having it presented in
some usable form on a webpage may provide a bit better visibility of what
needs to be fixed on the performance front.

-- 
Best regards,
Siarhei Siamashka


More information about the cairo mailing list