[HarfBuzz] hangul shaper patches

Mon Jan 20 12:30:54 PST 2014

On 20/1/14 15:26, Dohyun Kim wrote:

>
> I just have tested this kind of input string and the result is a
> little disappointing:
> Input string <U+1107,U+1109,U+1110,U+1161> does not rendered well. The
> output of current (patched) harfbuzz with UnBatang font is
> [uni1121=0+1024|uniD0C0=2+1024], the expected output being
> [uniA972.xxxx|uni1161.xxxx]

IIRC, <U+1107,U+1109,U+1110> is *not* canonically equivalent to U+A972, 
even though it may be perfectly logical to spell the complex jamo as a 
sequence of simpler jamo letters.

>
> The reason seems to be that we are currently applying "ccmp" opentype
> feature too late. If "ccmp" feature could be applied before the
> process of hangul shaper, the issue would disappear.

Currently, this example fails because the pair <U+1110,U+1161> gets 
composed to U+D0C0 during the preprocess_text function, and so by the 
time any OpenType features are applied, it's too late.

Fixing this is tricky within the current structure of the shaper, as the 
main hangul shaper function needs to run before we map the Unicode 
characters to glyphs, but the ccmp feature needs to run after the 
default char-to-glyph mapping has been done.

Is this actually important? Note that Windows behaves similarly, and so 
data that has "spelled-out" representations of complex jamos won't work 
there either. AIUI, the recommended practice is to use the precomposed 
Unicode characters such as U+A972 directly - and because these do *not* 
have decompositions, mixing the two forms will lead to confusion and 
problems for users. Perhaps it's better that the non-preferred spelling 
does not render "correctly".

JK