[HarfBuzz] Ngapi HarfBuzz Hackfest report (February 2013, London)

Behdad Esfahbod behdad at behdad.org
Mon Feb 25 21:37:42 PST 2013


Hi everyone,

I pushed harfbuzz-0.9.13 out earlier today, and a hackfest report is overdue.

Jonathan Kew and I met for the week of February 11 in London, UK and did more
HarfBuzz hacking.  Martin Hosken joined us on Tuesday to share his valuable
insight in Myanmar and other South-East Asian scripts.

Here is what we achieved:


= Myanmar

We implemented a brand new Myanmar complex shaper based on the spec released
by Microsoft [1], which is also what went into Windows 8.  Thanks to the spec,
this was a straightforward task.  The Myanmar spec is so much simpler than the
Indic specs that we decided that a separate shaper with a separate state
machine is more suitable to the task.  Thanks to the powerful ragel tool, the
resulting shaper turned out to be very straightforward and easy to get to
match Windows8 results.  We are essentially matching Uniscribe in every case
except for a small corner-case bug in Uniscribe.  It should be
indistinguishable to font developers and users.

[1] http://www.microsoft.com/typography/OpenTypeDev/myanmar/intro.htm


= Tai Tham, Cham, and New Tai Lue

While at it, we added a new South-East Asian shaper (called 'sea' in the code)
to handle simpler scripts that only have left-matras and prebase-reordering
medials.  Tai Tham, Cham, and New Tai Lue go through the new shaper, and all
three work as expected as far as our testing goes.


= Devanagari

Fixed the (embarrassing) issue with eyelash Ra in fonts with old-style
Devanagari spec.  We match Uniscribe in that case now.


= Malayalam

Fixed a bug with interaction of dotless-reph and prebase-reordering Ra.  It
happens that Uniscribe has the same bug.


= Kannada

Fixed a couple bugs in the lookup processing that while are not
Kannada-specific, where being hit with various Kannada fonts.


= 'Phags-Pa

Fixed shaping of 'Phags-Pa U+A872, which is the first character in Unicode to
have Arabic_Joining=L.


= "Default_Ignorables"

While fixing some Kannada issues, we ended up implementing a rather
sophisticated way of handling Default_Ignorable characters.  Default_Ignorable
is a category of Unicode characters that are by default not shown on the
screen.  These include things like ZWJ, ZWNJ, SOFT-HYPHEN, among others.  Put
the joiners aside.  For the others, you really don't want them to affect your
GSUB/GPOS matchings.  Ie, a SOFT-HYPHEN shouldn't break your ligature or kerning.

ZWJ/ZWNJ are more /complicated/.  According to Unicode, ZWNJ should disable a
ligature, while ZWJ should encourage it.  Before this change, and in any other
engine we have tested, inserting a ZWJ in fact breaks ligatures, as it blocks
the GSUB rules.

With this change, this is what we do now:

  * When matching GSUB rules: whenever we see a glyph for a Default_Ignorable
character other than ZWNJ, if that glyph matches the GSUB rule, we proceed
normally.  Otherwise, instead of jumping to a "no match", we skip the
Default_Ignorable glyph and keep matching,  As such, if the font has, eg,
rules that match a sequence of 'f',ZWJ,'i', that ligature will still match a
sequence of 'f',ZWJ,'i'.  But so does a lookup sequence of 'f','i', skipping
the ZWJ automatically,

  * When matching GPOS rules: we simply ignore all Default_Ignorable glyphs,
including ZWNJ,

  * For "basic shaping features" of Indic-like shapers, we disable the
automatic rules above for ZWJ and ZWNJ (but not other Default_Ignorables).
Indic-like scripts have very specific meanings attached to ZWJ and ZWNJ, and
we leave it completely to the font designer to tell us what to do,

We think that this is a major improvement over what we used to do (and every
other engine still does).  Feedback appreciated.


= Misc

Fixed tricky bug with sanitizing fonts that have overlapping (and broken) tables.

We also streamlined handling of zero-width marks for Indic and non-Indic
scripts, to match what Uniscribe does.


= Summary

I think this was yet another tremendously productive week of pair-programming
with Jonathan, and would like to thank him for finding them time to make this
happen.  I also like to thank Martin Hosken, whose expertise in the scripts
covered in this hackfest was key to making progress that we did.

The hackfest also marked two major milestones for HarfBuzz the shaper:

  * We fixed all shaping bugs known to us,

  * As far as we know, we correctly shape every script that Windows 8 shapes,
and then some.

Last but certainly not least, I like to thank all the other people on the
list, whom without their testing and feedback we couldn't get this far.  To
avoid embarrassingly missing people out, I pass on listing, but you know who
you are!

I also like to thank our employers, Google and Mozilla, for graciously funding
and hosting the hackfest.

Is there a script you want to see HarfBuzz support that it currently doesn't?
 Just ask, and we'll make it happen.


Cheers,
-- 
behdad
http://behdad.org/



More information about the HarfBuzz mailing list