[cairo] cairo-gl glyph rendering performance

Tue Apr 19 15:53:02 PDT 2011

On Wed, 20 Apr 2011 01:30:58 +0300, Alexandros Frantzis <alexandros.frantzis at linaro.org> wrote:
> Hi all!
> 
> I have been investigating the cairo-gl glyph implementation to see if we
> can improve the glyph rendering performance.
> 
> I have found that one source of performance loss is the overzealous selection
> of the "via mask" path when rendering glyphs. When using the "via mask" path,
> glyphs are first rendered to a temporary surface which is then used as a mask
> to draw the glyphs on the final destination.
> 
> In the current code, one of the reasons to use the mask path is because the
> glyphs overlap. Is this valid?

Yes. It is deeply engrained in the API that a single operation acts a
single mask. If the overlapping glyphs of the glyph string were to be
rendered individual then you would operate twice on the overlapping
pixels. Hence why we need to go construct a mask for the entire string to
a apply it as a single operation.

> In any case, the overlap detection test as implemented in
> _cairo_scaled_font_glyph_device_extents() is not suited for our needs for two
> reasons:

We know. Applying the KISS rule to avoid over-engineering.

> 1. The overlap detection algorithm checks the extents of each glyph against the
>    current total extent of previously processed glyphs. This works fine as long
>    as the glyph group is limited to a single line and drawn sequentially.

This is the *extremely* common case due to historical interface
limitations i.e. code that has evolved from using X interfaces or through
pango will perform line breaking.

> 2. Due to font kerning, glyphs extents are often found to be overlapping,
>    although the glyphs themselves are not actually overlapping.

Right, this is relatively common, about 25% of cases in ff, iirc.

> The important question here is how can actually achieve using the "via mask"
> path less. Can we remove the overlap factor completely? Assuming that not using
> a mask is wrong, how wrong are the results going to be? If the visual
> difference is small enough perhaps we can make this compromise to increase
> performance (or use an environment variable and leave it to the user to force
> the fast behavior).

No, the visual result is wrong and text output is one that people care
immensely about. The performance you measured is about 5-10x slower than
what can be achieved using an intermediate mask (guestimating based on the
i965 timings). So the extra step is not the bottleneck per-se.

Once I no longer feel embarrassed by the ddx performance, I'll gladly
embarrass mesa instead.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre