[HarfBuzz] optimization for ASCII-only text

Behdad Esfahbod behdad at behdad.org
Thu Aug 9 11:21:55 PDT 2012


Hi Jonathan,

Thanks for bringing this up.  And more profiling is hugely appreciated.

I like to take it to a different direction though: if text is all ASCII and
all glyphs are in the font, then the normalization pass should really boil
down to a cmap lookup for each glyph.  In which case we should be able to
combine it with hb_map_glyphs() to make one cmap lookup per character instead
of current too.  That should make the normalizer overhead go away.  Indeed, if
you check in the profile, all time spent in the normalizer is in HBGetGlyph,
and is equal to time spent in hb_substitute_default.

Easy fish would be hb_set_unicode_props.  Maybe I make that lazy.  Don't know.
 We need some of that stuff after mapping to glyphs, so we need to be able to
predict whether we would ever need to look at them...  Or make it faster.

Makes sense?

behdad

On 08/09/2012 01:32 PM, Jonathan Kew wrote:
> Hi Behdad,
> 
> While complex-script shaping is obviously far more interesting, in practice
> there is a lot of very simple ASCII text on the web. So what would you think
> of adding a minor optimization that looks like it can give us about 10% gain
> on shaping ASCII text with simple fonts? The idea is to make hb_buffer_add
> check whether any non-ASCII characters have been put in the buffer; and if
> not, there's no need to run the normalization pass.
> 
> (Of course, there are plenty of non-ASCII characters that could also be
> present without normalization becoming relevant, but I didn't want to make the
> check any more expensive than a simple character-code comparison, and
> optimizing performance of ASCII-only runs will benefit a lot of real-world
> text for minimal effort.)
> 
> This was prompted by profile data such as
> http://people.mozilla.com/~bgirard/cleopatra/?report=c2e6bea3647461c0675e59441b78c0f5c409ac0d
> (see https://bugzilla.mozilla.org/show_bug.cgi?id=762710#c25), which relates
> to layout of a large, almost purely ASCII document. This shows the
> normalization pass - which we know is redundant for ASCII-only text -
> contributing around 10% of the total shaping time. With this patch, that time
> simply vanishes from the profile.
> 
> JK
> 
> 
> 
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz



More information about the HarfBuzz mailing list