[HarfBuzz] unbreaking mixed-up khmer fonts

Jonathan Kew jfkthame at googlemail.com
Mon Nov 19 13:35:42 PST 2012


>  >> One thing has come up: it seems we've broken things for some
> widely-used Khmer
>  >> fonts. E.g. the font "hanuman.ttf" used on http://khmer.rfa.org/.
> This font
>  >> has a 'liga' feature, but nevertheless relies on going through the
> indic
>  >> shaper; it duplicates many of its lookups from 'abvs', etc., into
> 'liga' and
>  >> 'clig', but it does not use these features to do the pre-base vowel or
>  >> subjoined consonants.
>  >>
>  >> The result is that it fails when shaped using the generic shaper,
> but works ok
>  >> if we use the indic shaper with 'liga' disabled (because the
> features it puts
>  >> in liga are just references to the lookups that are also in abvs
> etc, so
>  >> ignoring liga is fine).
>  >>
>  >> And so I think we need to revert the Khmer part of
>  >> 981748cb2e9b48b77177b19ec1f972cab7afda89 (but keep the Myanmar
> part), and rely
>  >> only on 6b389ddc3623d042ded4731f4d62dc354002fdd0 to deal with the
> pre-base
>  >> duplication that we were seeing with fonts like Kh-Battambang.
> Remind me if
>  >> there's something else that would break?
>  >
>  > I think we were seeing Kh-Battang and family fail to do prebase
> reordering
>  > because they don't have a pref features.  We started adding the
> hardcoded-Ra
>  > support back for 'pref' stuff, but gave up since that was ugly.
>  >
>  > What do you think we should do now?
>  >
>
> Argh... yes, you're right, that breaks some (but not all) other fonts.
> There seem to be multiple versions of Kh-Battambang in circulation, for
> example; some will break, others won't. So I keep confusing myself by
> testing with varying versions. Sigh.
>
> I'll experiment some more with the various fonts I have on hand...

I've put a test page at http://people.mozilla.org/~jkew/kh/test.html 
that renders the sequences "ក្រុ ខេ គៀ" with 100+ fonts from several 
sources. They're mostly different versions of KhmerOS fonts, but there 
are a few others as well.

Some initial observations:

(a) The font "Sankor.ttf" fails completely; inspection shows that it 
lacks any OpenType features, either Indic-style or generic. So we can 
ignore that one.

(b) A number of the fonts - colored red in the test page - are rejected 
by OTS (for having a bad OS/2 table; I haven't looked into details) in 
Firefox and Chrome, and so you'll get fallback to whatever your local 
default is. In Firefox, at least, you can set 
gfx.downloadable_fonts.sanitize to false in about:config (and reload the 
page) to bypass the sanitizer and see the actual fonts.

(c) In current Firefox Nightly, which has harfbuzz 
43b653150081a2f9dc6b7481229ac4cd952575dc, almost all the fonts shape 
correctly; the exception is Hanuman.

(d) If I make the change suggested above, so that Khmer fonts with 
'liga' are shaped via the Indic shaper (but with 'liga' disabled), this 
fixes Hanuman, but breaks a bunch of other fonts where the pre-base Ra 
no longer happens - in particular, all the KhUnicode210 collection, and 
a number of the KhmerUnicodeFonts. Many of those that break are the red 
samples that OTS would reject as webfonts, but some of the non-red ones 
break as well.

(e) Of course, if we also revert the liga-disabling commit, a number of 
the fonts (all the KhUnicode210 faces, and about half the 
KhmerUnicodeFonts faces) exhibit the problem of doubling the left part 
of the matra in "គៀ", which was what started us into this morass.



More information about the HarfBuzz mailing list