[Pixman] [PATCH] Move fallback decisions from implementations into pixman-cpu.c.

Tue Jan 25 18:01:30 PST 2011

On Monday 24 January 2011 20:02:46 Søren Sandmann wrote:
> Instead of having each individual implementation decide which fallback
> to use, move it into pixman-cpu.c, where a more global decision can be
> made.
> 
> This is accomplished by adding a "fallback" argument to all the
> pixman_implementation_create_*() implementations, and then in
> _pixman_choose_implementation() pass in the desired fallback.
> ---
>  pixman/pixman-arm-neon.c  |    7 +------
>  pixman/pixman-arm-simd.c  |    5 ++---
>  pixman/pixman-cpu.c       |   30 +++++++++++++++++++-----------
>  pixman/pixman-fast-path.c |    5 ++---
>  pixman/pixman-mmx.c       |    5 ++---
>  pixman/pixman-private.h   |   12 ++++++------
>  pixman/pixman-sse2.c      |    7 +------
>  pixman/pixman-vmx.c       |    5 ++---
>  8 files changed, 35 insertions(+), 41 deletions(-)

Looks good to me. Especially the decrease of the lines of code.

By the way, fallbacks may need to be tweaked a bit to be sure that
really the fastest code is selected. For example, after the introduction
of faster fetchers [1], now the performance of 'over_8888_8888'
operation (translucent case) with the nearest scaling of the source
image from 2000x2000 to approximately same size looks like this:

C fetcher + C combiner:     ~76.71 MPix/s
C fast path:                ~106.47 MPix/s
C fetcher + SSE2 combiner:  ~182.65 MPix/s
SSE2 fast path:             ~270.57 MPix/s

The important part is that "C fetcher + SSE2 combiner" is now faster
than full "C fast path", which performs everything in a single pass, but
without SSE2. So when SSE2 combiners are available, it makes sense to
deactivate nearest scaling C fast paths for OVER operator. Not everything
is so simple though, because this C fast path has special processing for
fully transparent or opaque pixels, and may in some cases be actually
better. But it is clearly slower if the performance of the worst case is
important.

Something similar may apply to PPC Altivec optimizations. Because there
are only Altivec combiners but not full Altivec fast paths, the existing
C fast paths will be executed for some compositing operations, preventing
the use of Altivec combiners.

1. http://cgit.freedesktop.org/pixman/commit/?id=536cf4dd3bd144ad

-- 
Best regards,
Siarhei Siamashka