[Pixman] Pixman glyph compositing

Wed Jan 23 15:43:31 PST 2013

David Herrmann <dh.herrmann at googlemail.com> writes:

> While working on kmscon the main rendering task I am faced with is
> blending a glyph into the main framebuffer with a constant foreground
> and background color. The code I have been using is a per-pixel
> blending operation on each color value:
>
> For each pixel "i" I do:
>   r = alpha[i] * foreground.r + (255 - alpha[i]) * background.r
>   g = alpha[i] * foreground.g + (255 - alpha[i]) * background.g
>   b = alpha[i] * foreground.b + (255 - alpha[i]) * background.b
>   r /= 255;
>   g /= 255;
>   b /= 255;
>   dst[i] = (r << 16) | (g << 8) | b;
>
> So I have an 8bit alpha channel from the glyph as input and an xrgb32
> output framebuffer. The 24bit foreground/background values are
> constant during a single blending operation.
>
> I already optimized this by special-casing alpha[i] == 0 or 255 and I

This is usually a win if it avoid a memory access, but that's not the
case here where you don't read the destination at all.

> changed the division to 256 instead of 255. However, I was wondering
> whether pixman can provide a better alternative. Unfortunately, the
> fastest code I could come up with was (using shadow-buffer):
>
> pixman_fill(shadow, background);
> pixman_composite(OVER, foreground, alpha, shadow);
> pixman_blt(shadow, dst);
>
> I use a shadow buffer as I _really_ want to avoid to composite
> directly into the hardware buffer (which is in most cases way slower
> than the extra pixman_blt). However, this scenario requires writing
> the data three times and even reading it during the composite
> operation. But still, thanks to pixman-optimizations, this turns out
> to be almost exactly as fast as my own trivial implementation. So I
> was wondering whether anyone has ideas how to speed this up?
>
> Is there a way to perform this operation with a single pixman call?

If bg happens to be black, then the whole thing could be done with

       composite (SRC, fg, alpha, hardware_buffer)

You could speed this up by caching the glyphs in a pixman_glyphs_t
structure and then using pixman_composite_glyphs(), or if you are sure
that your glyphs will never overlap each other,
pixman_composite_glyphs_no_mask().

But in the general case, I don't think it's possible to do this in one
pass with the current pixman API.

> If not, are there any other optimizations I should consider?

Some random comments:

- The x / 255 can be done with

      t = x + 0x80
      return (t + (t >> 8)) >> 8;

- If you want to stick with a division by 256, you may want to add 0xff
  before shifting. That way 0xff * 0xff = 0xff instead of 0xfe.

- There are some macros in pixman/pixman-combine32.h that can do these
  types of computations on two channels as a time.

- If you are using a shadow buffer that is the size of the full screen,
  then it may be interesting to reduce it to the size of one glyph so
  that it fits in L1.

- You might want to consider caching pre-composited glyphs indexed by
  fg, bg under the assumption that the number of color combinations
  isn't that large.

Søren