[Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

Siarhei Siamashka siarhei.siamashka at gmail.com
Tue Sep 14 05:26:27 PDT 2010

On Tuesday 14 September 2010 08:53:37 Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > +/* A variant of 'over', which works faster for non-additive blending on
> > the + * platforms which do not have special instructions for saturated
> > addition + */
> > +static force_inline uint32_t
> > +over_a (uint32_t src, uint32_t dest, pixman_bool_t additive_blending)
> > +{
> > +    uint32_t a = ~src >> 24;
> > +    if (additive_blending)
> > +    {
> > +	UN8x4_MUL_UN8_ADD_UN8x4 (dest, a, src);
> > +	return dest;
> > +    }
> > +    else
> > +    {
> > +	UN8x4_MUL_UN8 (dest, a);
> > +	return dest + src;
> > +    }
> > +}
> Is there any reason to not just add a boolean "additive_blending" to
> the existing force_inline over() function?

No particular reason. The patch was just self contained and a bit less
intrusive this way (in the sense of having less places in the code changed). It
was mostly intended as a preview for the idea. The final implementation can
indeed be done so that it better blends with the rest of code.

So does it make sense to split the patch into parts, introducing this third
argument for 'over' function first?

> It might also be interesting to add the check as a new
> NOT_SUPER_LUMINESCENT flag and then simply require it for the source
> for all the over_n_*() functions.

I see many reasons *not* to add it as a new flag:

1. It takes one extra flag bit. There are already 24 bits used, with only 8 
remaining. We still need some flag(s) for rotation transforms:
I expect that compacting bits later may turn out to be tricky, so it may be 
wise not to waste them in the first place. Extending flag bits to 64-bit
variable is possible, but may reduce performance.

2. After introducing this bit, every compositing operation with a solid
source will do calculation for this flag, spending some time on it. But
calculation of this flag is not needed for many operators (SRC for example). 
Also it is only useful exclusively for C fast paths and simple SIMD-incapable 
processors, everyone else will just take a tiny performance hit.

The 'last mile' check as implemented in my patch should be fine as far as 
performance is concerned. The only drawback is that the one who implements the
fast path functions, will be forced to handle all possible types of input data.
And not be lazy providing just NOT_SUPER_LUMINESCENT operation only, relying on
pixman to fallback to someting else when needed.

BTW, I like this 'super-luminescent' term :) I tried to search for the 
information about the case when "color components exceed alpha in premultiplied 
format", and it looked like many (game developers) know about this thing and
its features, but seemed like nobody had a clear single-word definition for it.
Searching for "super-luminescent premultiplied" gives some references, all in 
cairo and freedesktop.org context. Anyway, let's indeed call this thing
'super-luminescent'. I think I need to update comments in the patch and also in
the commit message to use it instead of 'additive blending', which I took from:

> That would allow similar optimizations for the n_8_565 case and probably the
> n_8888_8888_ca() case as well.

Yes, and also 'over_n_8888' could make use of this optimization (if C fast path 
function even gets implemented for it).

> The flag could be set for all the gradients and any time an image is
> opaque.

I'm not quite sure about how useful this flag could be for gradients (it would 
have to be somehow propagated to the scanline combiner function?).

But another important operation is over_8888_8888. And it is hard to do 
anything with it because we don't known if there are any super-luminescent
pixels in the source image. Maybe it can make sense reconsidering how the
super-luminiscent colors are handled in general? When discussing it on IRC the
other day, there were even concerns about where such pixels could possibly
come from and whether they are even used in cairo in any way.

But this stuff is only important for C implementation and simple processors.
So for now I would just go after the simple C fast path functions like
over_n_8_8888/over_n_8_0565 and maybe add the rest of ideas into TODO list.
Probably some people are more motivated in improving pixman performance on
simple processors? I'm adding Georgi Beloev to CC just in case because he
seems to be interested in MIPS32R2.

Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100914/99fd9296/attachment.pgp>

More information about the Pixman mailing list