[HarfBuzz] hangul shaper patches

Dohyun Kim nomosnomos at gmail.com
Mon Jan 20 07:26:56 PST 2014


Thank you so much, Jonathan.
Your patches to hangul shaper works really great.

2014/1/20 Jonathan Kew <jfkthame at googlemail.com>:
> On 20/1/14 02:21, Roozbeh Pournader wrote:
>>
>> Jonathan,
>>
>> I was wondering if the new patches would have all the canonically
>> equivalent characters sequences rendered the same way. Microsoft people
>> have said publicly that their Hangul shaper intentionally doesn't do that.
>>
>
> The intention is that canonically equivalent sequences should render the
> same. I'm aware that MS doesn't do this in certain cases, as mentioned:
>
>
>>          (b) a
>>     handful of words where there's an <LV, T> sequence that uniscribe
>>     doesn't support (it has no corresponding LVT syllable), but we
>>     handle by decomposing to <L, V, T> and applying jamo features.
>
>
> An example of this is <U+B4C0,U+11F0>, where uniscribe (using Malgun Gothic)
> renders the two default, unshaped glyphs for U+B4C0 (an LV syllable) and
> U+11F0 (a trailing jamo) separately, while harfbuzz decomposes U+B4C0 into
> separate leading- and vowel-jamo glyphs and then applies ljmo/vjmo/tjmo
> features so that the three jamos are properly composed into a single
> syllable block.
>
> Thus, with harfbuzz the two sequences
>   <U+B4C0,U+11F0>
>   <U+1103,U+1172,U+11F0>
> render the same. As I understand things, the Korean standard says the former
> spelling should not be used, but IMO that cannot override the fact that the
> Unicode standard defines them as canonically equivalent, so rendering them
> identically is correct.
>
> What the patched harfbuzz still -doesn't- implement is shaping "spelled out"
> versions of Old Hangul sequences with multiple L, V and/or T jamos. The old
> MS Hangul spec gave an example where the leading jamo now encoded at U+A972
> (CHOSEONG PIEUP-SIOS-THIEUTH) was encoded as the sequence
> <U+1107,U+1109,U+1110> and then composed (and similarly for the V and T
> jamos), so that a complete syllable was composed from a sequence of the form
> <L, L, L, V, V, V, T, T, T>.
>
> I experimented with a patch that would support this, and the result looked
> OK (to my un-Korean eyes) when using the UnBatang font (not so good with
> Malgun Gothic). However, this is not canonically equivalent, and my
> understanding is that with Unicode having added all the complex jamos, there
> is no longer any real requirement or desire to support such sequences. So I
> haven't included this.
>

I just have tested this kind of input string and the result is a
little disappointing:
Input string <U+1107,U+1109,U+1110,U+1161> does not rendered well. The
output of current (patched) harfbuzz with UnBatang font is
[uni1121=0+1024|uniD0C0=2+1024], the expected output being
[uniA972.xxxx|uni1161.xxxx]

The reason seems to be that we are currently applying "ccmp" opentype
feature too late. If "ccmp" feature could be applied before the
process of hangul shaper, the issue would disappear.

Best Regards,
-- 
Dohyun Kim
Seoul, Republic of Korea


More information about the HarfBuzz mailing list