[HarfBuzz] hangul shaper patches
Jonathan Kew
jfkthame at googlemail.com
Mon Jan 20 12:30:54 PST 2014
On 20/1/14 15:26, Dohyun Kim wrote:
>
> I just have tested this kind of input string and the result is a
> little disappointing:
> Input string <U+1107,U+1109,U+1110,U+1161> does not rendered well. The
> output of current (patched) harfbuzz with UnBatang font is
> [uni1121=0+1024|uniD0C0=2+1024], the expected output being
> [uniA972.xxxx|uni1161.xxxx]
IIRC, <U+1107,U+1109,U+1110> is *not* canonically equivalent to U+A972,
even though it may be perfectly logical to spell the complex jamo as a
sequence of simpler jamo letters.
>
> The reason seems to be that we are currently applying "ccmp" opentype
> feature too late. If "ccmp" feature could be applied before the
> process of hangul shaper, the issue would disappear.
Currently, this example fails because the pair <U+1110,U+1161> gets
composed to U+D0C0 during the preprocess_text function, and so by the
time any OpenType features are applied, it's too late.
Fixing this is tricky within the current structure of the shaper, as the
main hangul shaper function needs to run before we map the Unicode
characters to glyphs, but the ccmp feature needs to run after the
default char-to-glyph mapping has been done.
Is this actually important? Note that Windows behaves similarly, and so
data that has "spelled-out" representations of complex jamos won't work
there either. AIUI, the recommended practice is to use the precomposed
Unicode characters such as U+A972 directly - and because these do *not*
have decompositions, mixing the two forms will lead to confusion and
problems for users. Perhaps it's better that the non-preferred spelling
does not render "correctly".
JK
More information about the HarfBuzz
mailing list