[Pixman] Pixman glyph compositing

Thu Jan 24 04:16:50 PST 2013

Hi Soren

On Thu, Jan 24, 2013 at 12:43 AM, Søren Sandmann <sandmann at cs.au.dk> wrote:
> David Herrmann <dh.herrmann at googlemail.com> writes:
>
>> While working on kmscon the main rendering task I am faced with is
>> blending a glyph into the main framebuffer with a constant foreground
>> and background color. The code I have been using is a per-pixel
>> blending operation on each color value:
>>
>> For each pixel "i" I do:
>>   r = alpha[i] * foreground.r + (255 - alpha[i]) * background.r
>>   g = alpha[i] * foreground.g + (255 - alpha[i]) * background.g
>>   b = alpha[i] * foreground.b + (255 - alpha[i]) * background.b
>>   r /= 255;
>>   g /= 255;
>>   b /= 255;
>>   dst[i] = (r << 16) | (g << 8) | b;
>>
>> So I have an 8bit alpha channel from the glyph as input and an xrgb32
>> output framebuffer. The 24bit foreground/background values are
>> constant during a single blending operation.
>>
>> I already optimized this by special-casing alpha[i] == 0 or 255 and I
>
> This is usually a win if it avoid a memory access, but that's not the
> case here where you don't read the destination at all.

It's twice as fast with this optimization on my machine. Considering
that >99% of the time one of both is true when drawing a console (even
with AA fonts).

>> changed the division to 256 instead of 255. However, I was wondering
>> whether pixman can provide a better alternative. Unfortunately, the
>> fastest code I could come up with was (using shadow-buffer):
>>
>> pixman_fill(shadow, background);
>> pixman_composite(OVER, foreground, alpha, shadow);
>> pixman_blt(shadow, dst);
>>
>> I use a shadow buffer as I _really_ want to avoid to composite
>> directly into the hardware buffer (which is in most cases way slower
>> than the extra pixman_blt). However, this scenario requires writing
>> the data three times and even reading it during the composite
>> operation. But still, thanks to pixman-optimizations, this turns out
>> to be almost exactly as fast as my own trivial implementation. So I
>> was wondering whether anyone has ideas how to speed this up?
>>
>> Is there a way to perform this operation with a single pixman call?
>
> If bg happens to be black, then the whole thing could be done with
>
>        composite (SRC, fg, alpha, hardware_buffer)

Ah, right, that does work in many cases as default background is
normally black. I will add that.

> You could speed this up by caching the glyphs in a pixman_glyphs_t
> structure and then using pixman_composite_glyphs(), or if you are sure
> that your glyphs will never overlap each other,
> pixman_composite_glyphs_no_mask().

I already cache the glyphs in a hash-table that works pretty well.
Also "perf" shows me that the glyph-lookup makes only 2% of the
rendering-time. Does the pixman_glyph cache do more than that? Any
other magic I am not aware of? Or is it a simple hash table? Because
then I will probably stick with my own implementation.

> But in the general case, I don't think it's possible to do this in one
> pass with the current pixman API.
>
>> If not, are there any other optimizations I should consider?
>
> Some random comments:
>
> - The x / 255 can be done with
>
>       t = x + 0x80
>       return (t + (t >> 8)) >> 8;

That's the stuff I was looking for. Thanks!

> - If you want to stick with a division by 256, you may want to add 0xff
>   before shifting. That way 0xff * 0xff = 0xff instead of 0xfe.
>
> - There are some macros in pixman/pixman-combine32.h that can do these
>   types of computations on two channels as a time.

I now tried UN8x4_MUL_UN8_ADD_UN8x4_MUL_UN8 (although 8x3 would be
fine, too) and it works as good as the code I had before. But I have
only one test machine so I cannot tell whether it performs better on
other architectures.

> - If you are using a shadow buffer that is the size of the full screen,
>   then it may be interesting to reduce it to the size of one glyph so
>   that it fits in L1.
>
> - You might want to consider caching pre-composited glyphs indexed by
>   fg, bg under the assumption that the number of color combinations
>   isn't that large.

I want to avoid that. It wastes a lot of memory and works only if the
colors stay the same. The glyph cache is already big enough for an
emergency console that is rarely used.

Thanks for the ideas!
David