[Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

Soeren Sandmann sandmann at daimi.au.dk
Wed Sep 15 05:06:44 PDT 2010


Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On Tuesday 14 September 2010 19:10:00 Soeren Sandmann wrote:
> > On the other hand, there are drawbacks to having the check in each
> > fast path too:
> > 
> > - It adds a bunch of both source and binary code that nobody will ever
> >   run except through the test suite. It seems kind of pointless to
> >   have an optimization for such an uncommon case.
> > 
> > - The check will happen for each invocation of the composite
> >   operation, whereas with the flag, the computation can be cached in
> >   the source image.
> > 
> > I'm not terribly concerned about wasting flag bits. There are still
> > 25% of them left. In 0.22 I think we can get rid of the
> > NEED_WORKAROUND one, and the NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, and
> > NO_ACCESSORS could be collapsed to one NO_CRACK flag.
> 
> So it is decided that we are adding a new flag then?

Well, I'm not the one doing the work, and I wouldn't *reject* a patch
that did the testing within the fast path.

However, I do think that a flag would be a better way to do it.

> > Or they can be extended to 64 bits.
> 
> IMHO, extending to 64 bits should be only done as the last resort. It's going
> to be slower of 32-bit systems. How much and whether it is tolerable, that's
> another question.

It would be, yes, but there is a real danger in obsessing about these
microoptimizations. Consider the important n_8_8888/0565 fast
paths. It's used for two things mainly: glyphs and geometry.

- Typically drawing a text results in multiple calls to ADD and a call
  to n_8_8888. That means we gain a lot from the fast path cache and
  from fast fast path lookup in general.

  But to really make a difference with glyphs, it would be better
  to add explicit support for the Render glyph mechanism by adding API
  to composite an entire string so that the lookup time becomes
  irrelevant compared to the actual compositing. 

- Geometry is the other thing that typically ends up in this fast
  path. This where someone uses cairo to draw some shape with a solid
  color.

  This is also a case where the higher level optimization of adding a
  polygon rasterizer to pixman would provide a lot more bang for the
  buck than shaving nanoseconds off of pixman_image_composite32().

  That would be much faster for things like large diagonal lines,
  where we end up calling n_8_8888 on a mask which is full of zeros.

  With a polygon image, we could composite and rasterize in one go,
  which would save a *ton* of memory accesses.

Basically, I think it's usually better to spend time on the the
highlevel optimizations "make glyphs fast" or "make shapes fast"
rather than worry about whether the fast path look-up is doing 32 bit
comparisons or 64 bit comparisions.

Or alternatively, profile the code and fix the things that stand
out. For example, it turned out that the analyze_extents() and
compute_samples_extent() that I added recently are actually showing up
on profiles. 

     http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=analyze-extents&id=b07cee6d1e463d782f243e511df02de837c18c96

This fix made one of the cairo traces about one second faster than
current master (and about 0.4 seconds faster than 0.18.0).

> > > > That would allow similar optimizations for the n_8_565 case and
> > > > probably the n_8888_8888_ca() case as well.
> > > 
> > > Yes, and also 'over_n_8888' could make use of this optimization (if C
> > > fast path function even gets implemented for it).
> > 
> > Existing fast path operations that can take advantage of this:
> > 
> >         n_8_0565
> >         n_8_0888
> >         n_8_8888
> >         n_1_8888
> >         n_1_0565
> >         n_8888_8888_ca
> >         n_8888_0565_ca
> >         x888_8_8888     (because x888 is never superluminescent)
> > 
> > So if the optimization were fully implemented it would be quite a bit
> > of dead code.
> 
> What about changing 'general_composite_rect' function from static to global 
> and using it as a fallback in such cases where the fast path function decides
> that "oops, I actually can't or don't want to do this"? At least this may be
> also handy for debugging or developing new code when some complex fast path
> function is only partially implemented initially.

If it turns out that the flags are not sufficient to describe the fast
paths anymore, I think that should be fixed in a general way, and not
by adding an ad hoc call to general_composite_rect(), preventing the
other fast paths from potentially being called.

Pixman used to have extremely complicated logic to select fast paths
that would eventually fall back to what fbCompositeGeneral(). Adding
the fast path tables made that code much easier to deal with. In fact,
it turned out that a couple of fast paths were being incorrectly
skipped just because no one could understand what was going on.


Soren


More information about the Pixman mailing list