[Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

Soeren Sandmann sandmann at daimi.au.dk
Tue Sep 14 09:10:00 PDT 2010


Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> So does it make sense to split the patch into parts, introducing this third
> argument for 'over' function first?

Yeah, makes sense.

> > It might also be interesting to add the check as a new
> > NOT_SUPER_LUMINESCENT flag and then simply require it for the source
> > for all the over_n_*() functions.
> 
> I see many reasons *not* to add it as a new flag:
> 
> 1. It takes one extra flag bit. There are already 24 bits used, with only 8 
> remaining. We still need some flag(s) for rotation transforms:
> http://lists.freedesktop.org/archives/pixman/2010-August/000420.html
> I expect that compacting bits later may turn out to be tricky, so it may be 
> wise not to waste them in the first place. Extending flag bits to 64-bit
> variable is possible, but may reduce performance.
> 
> 2. After introducing this bit, every compositing operation with a solid
> source will do calculation for this flag, spending some time on it. But
> calculation of this flag is not needed for many operators (SRC for example). 
> Also it is only useful exclusively for C fast paths and simple SIMD-incapable 
> processors, everyone else will just take a tiny performance hit.
> 
> The 'last mile' check as implemented in my patch should be fine as far as 
> performance is concerned. The only drawback is that the one who implements the
> fast path functions, will be forced to handle all possible types of input data.
> And not be lazy providing just NOT_SUPER_LUMINESCENT operation only, relying on
> pixman to fallback to someting else when needed.

On the other hand, there are drawbacks to having the check in each
fast path too:

- It adds a bunch of both source and binary code that nobody will ever
  run except through the test suite. It seems kind of pointless to
  have an optimization for such an uncommon case.

- The check will happen for each invocation of the composite
  operation, whereas with the flag, the computation can be cached in
  the source image.

I'm not terribly concerned about wasting flag bits. There are still
25% of them left. In 0.22 I think we can get rid of the
NEED_WORKAROUND one, and the NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, and
NO_ACCESSORS could be collapsed to one NO_CRACK flag. Or they can be
extended to 64 bits.

> BTW, I like this 'super-luminescent' term :) I tried to search for the 
> information about the case when "color components exceed alpha in premultiplied 
> format", and it looked like many (game developers) know about this thing and
> its features, but seemed like nobody had a clear single-word definition for it.
> Searching for "super-luminescent premultiplied" gives some references, all in 
> cairo and freedesktop.org context. Anyway, let's indeed call this thing
> 'super-luminescent'. I think I need to update comments in the patch and also in
> the commit message to use it instead of 'additive blending', which I took from:
> http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Premultiplied%20alpha]]

I think the word originally comes from Jim Blinn's book "Dirty Pixels"
[1], except that apparently he calls them "superluminous". I'll highly
recommend that book for anyone intested in background material on
Render, pixman and cairo.

> > That would allow similar optimizations for the n_8_565 case and probably the
> > n_8888_8888_ca() case as well.
> 
> Yes, and also 'over_n_8888' could make use of this optimization (if C fast path 
> function even gets implemented for it).

Existing fast path operations that can take advantage of this:

        n_8_0565
        n_8_0888
        n_8_8888
        n_1_8888
        n_1_0565
        n_8888_8888_ca
        n_8888_0565_ca
        x888_8_8888     (because x888 is never superluminescent)

So if the optimization were fully implemented it would be quite a bit
of dead code.

> > The flag could be set for all the gradients and any time an image
> > is opaque.
> 
> I'm not quite sure about how useful this flag could be for gradients (it would 
> have to be somehow propagated to the scanline combiner function?).

Right, it would have to be propagated to the scanline combiner to be
useful. 


Soren


[1] http://www.amazon.com/Jim-Blinns-Corner-Kaufmann-Computer/dp/1558604553


More information about the Pixman mailing list