[HarfBuzz] hangul shaper patches

Jonathan Kew jfkthame at googlemail.com
Thu Jan 23 00:42:55 PST 2014


On 23/1/14 03:39, Dohyun Kim wrote:
> I've just found that jieubsida fonts [1] from Tsukurimashou Font
> Project [2] do not work well with current hangul shaper.
>
> ~$ hb-unicode-encode AC00 | hb-shape --script=hang JieubsidaBatang.otf
> [uni1100=0+0|uni1161=0+833]
>
> Expected output is:
>
> ~$ hb-unicode-encode AC00 | hb-shape --script=latn
> --features=ljmo,vjmo JieubsidaBatang.otf
> [uniAC00=0+833]
>
> The reason seems to be that hangul shaper is currently applying *jmo
> features too early. The author of jieubsida fonts has intended to
> apply *jmo features after ccmp feature, and so arranged the order of
> gsub lookup tables. But hangul shaper is applying *jmo features before
> everything else.

I don't think that's quite accurate; rather, the issue occurs because 
the hangul shaper isn't applying *jmo features to glyphs that result 
from ccmp decomposition. And then because the *jmo features haven't been 
applied to choose contextual forms of the jamos, the ligature that was 
expected to re-compose the syllable doesn't match either. See below.

>
> I am curious about what the output on windows 8 machine is, which is
> not available to me for now.

With --shaper=uniscribe on a Win8 machine, I get the "incorrect" output 
[uni1100=0+0|uni1161=0+833], matching harfbuzz behavior.

So I think this is a font error. The font is using ccmp to decompose the 
syllable AC00 into L and V jamos, but then expecting the shaper to apply 
*jmo features to the resulting glyphs. That doesn't work, because 
decomposing via ccmp has no awareness of the hangul-specific syllable 
structure.

(Then, after choosing contextual forms of the jamos, it expects to use 
liga to reassemble them into the single glyph for the syllable.)

A syllable such as AC00 will be decomposed into jamos *if necessary* by 
code within the shaper itself, in which case it will also apply features 
appropriately. The font should *not* use the generic ccmp feature to 
decompose it, unless it intends to do *everything* using generic global 
features, not the hangul-specific features.

I guess this font used to work because the old "dumb" hangul shaper 
applied the *jmo features globally, but this is not how they're intended 
to be used, and is not how uniscribe works. The shaper is now applying 
the features selectively, as intended.

So the font is using the wrong strategy. It should be simplified to 
remove the syllable decompositions from ccmp; that's handled by the 
shaper itself. (And it doesn't need the liga feature to reassemble the 
original syllables, either, as the shaper won't decompose them unless 
actually necessary, e.g. to support an <LV, T> sequence.)

JK



More information about the HarfBuzz mailing list