[Pixman] [PATCH 2/2] Delete simple repeat code

Fri Dec 17 09:17:37 PST 2010

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> That's a good idea, but call overhead of the existing fast paths may be too
> high. Even right now, just adding a new (mostly empty) implementation shows a 
> small, but measurable performance regression which I observed when trying
> to benchmark MIPS optimizations. This happens because we are walking through
> combiners delegate chain for each scanline. I personally think that
> 'delegate_combine_32' is evil and should be eventually removed. And increasing  
> call overhead in any other ways is better to be avoided too.
> 
> More lightweight fast paths with smaller call overhead are probably nothing
> else than the existing combiners. Surely the way how they are looked up can be 
> changed to be the same as fast paths. 

I agree that walking the delegate chain on every scanline is evil. It
wouldn't be very hard to just look up a combiner outside the loop and
then call it directly. With a couple of flags

        COMBINE_NARROW, COMBINE_COMPONENT_ALPHA

it would be possible to get rid of all the combine_32/64/ca/u calls
and just look up a combiner in a 

        pixman_combiner_t combiners[4][PIXMAN_N_OPERATORS]

table.

> But from the practical point of view, the 
> thing that is really worth optimizing here is a8 mask, because it is important 
> for all platforms (565 is only relevant for the embedded platforms, the other 
> formats are probably not relevant at all). Could the standard combiners be 
> changed to work with a8 mask, instead of a8x8x8x8 as it is now?

It probably could be done, but it wouldn't be straight-forward. All
images would need to gain the ability to fetch in a8 format as well as
in a8r8g8b8, and unless the wide combiners would be generated
differently from narrow ones, a16 versions would be needed too.

Soren