[HarfBuzz] hangul shaper patches
Jonathan Kew
jfkthame at googlemail.com
Thu Jan 23 00:42:55 PST 2014
On 23/1/14 03:39, Dohyun Kim wrote:
> I've just found that jieubsida fonts [1] from Tsukurimashou Font
> Project [2] do not work well with current hangul shaper.
>
> ~$ hb-unicode-encode AC00 | hb-shape --script=hang JieubsidaBatang.otf
> [uni1100=0+0|uni1161=0+833]
>
> Expected output is:
>
> ~$ hb-unicode-encode AC00 | hb-shape --script=latn
> --features=ljmo,vjmo JieubsidaBatang.otf
> [uniAC00=0+833]
>
> The reason seems to be that hangul shaper is currently applying *jmo
> features too early. The author of jieubsida fonts has intended to
> apply *jmo features after ccmp feature, and so arranged the order of
> gsub lookup tables. But hangul shaper is applying *jmo features before
> everything else.
I don't think that's quite accurate; rather, the issue occurs because
the hangul shaper isn't applying *jmo features to glyphs that result
from ccmp decomposition. And then because the *jmo features haven't been
applied to choose contextual forms of the jamos, the ligature that was
expected to re-compose the syllable doesn't match either. See below.
>
> I am curious about what the output on windows 8 machine is, which is
> not available to me for now.
With --shaper=uniscribe on a Win8 machine, I get the "incorrect" output
[uni1100=0+0|uni1161=0+833], matching harfbuzz behavior.
So I think this is a font error. The font is using ccmp to decompose the
syllable AC00 into L and V jamos, but then expecting the shaper to apply
*jmo features to the resulting glyphs. That doesn't work, because
decomposing via ccmp has no awareness of the hangul-specific syllable
structure.
(Then, after choosing contextual forms of the jamos, it expects to use
liga to reassemble them into the single glyph for the syllable.)
A syllable such as AC00 will be decomposed into jamos *if necessary* by
code within the shaper itself, in which case it will also apply features
appropriately. The font should *not* use the generic ccmp feature to
decompose it, unless it intends to do *everything* using generic global
features, not the hangul-specific features.
I guess this font used to work because the old "dumb" hangul shaper
applied the *jmo features globally, but this is not how they're intended
to be used, and is not how uniscribe works. The shaper is now applying
the features selectively, as intended.
So the font is using the wrong strategy. It should be simplified to
remove the syllable decompositions from ccmp; that's handled by the
shaper itself. (And it doesn't need the liga feature to reassemble the
original syllables, either, as the shaper won't decompose them unless
actually necessary, e.g. to support an <LV, T> sequence.)
JK
More information about the HarfBuzz
mailing list