[cairo] [PATCH/RFC][pixman] More ARM NEON performance updates
sandmann at daimi.au.dk
Thu Dec 10 13:56:50 PST 2009
Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> 2. Some fetch/store functions (r5g6b5 format is the most interesting) benefit
> from SIMD optimizations a lot, at least for ARM NEON:
> This is a little bit inconsistent with the other SIMD optimizations which are
> handled via pixman_implementation_t. So I'm all open to any suggestions about
> how to do it in a right way.
First, I think architecture specific fetchers are a very good
idea. There are a couple of bugs in bugzilla with SSE2 fetchers for
some formats, and both gradients and bilinear scaling could become
much faster with architecture specific code.
The way I have been thinking about is to have implementations involved
when the images are created. During the creation they could then plug
in their own fetchers. So something along these lines:
- The pixman_image struct will be renamed to something like
pixman_image_common, and it will contain the set of properties that
describe the image completely. Eg., it will contain the
transformation and the filter since these are inherent in what the
image *is*. It will not contain any of the fetcher functions etc.,
because those are essentially just caches - they could be recomputed
from the generic struct if necessary.
- A pixman_image will then be something that the implementation can
create, and it will contain
- a pointer to the pixman_image_common.
- fetch/store scanline functions
- a property changed function
- a pointer to a fallback pixman_image
- whatever other information the implementations want to cache
about the image.
- The fetch and store functions can then either do the fetching if
they know how to, or they can fall back to the fetch/store in the
So, pixman_image_create_bits() would create the common struct, then
call the implementation's create_bits_image(). That function would
fill in the property_changed() function.
The property_changed() function would fill in the fetch_scanline slot
with either an architecture specific fetcher or a delegate call that
would call fetch_scanline() for the next image in the fallback chain.
As with the implementation delegates, if you can find a simpler setup,
I wouldn't be opposed to it, as long as it can do these things:
- Allows fallbacks from SSE2->MMX->fast->generic
- Doesn't rule out fetchers for gradients
More information about the cairo