[HarfBuzz] hangul shaper patches

Jonathan Kew jfkthame at googlemail.com
Mon Jan 20 00:12:03 PST 2014


On 20/1/14 02:21, Roozbeh Pournader wrote:
> Jonathan,
>
> I was wondering if the new patches would have all the canonically
> equivalent characters sequences rendered the same way. Microsoft people
> have said publicly that their Hangul shaper intentionally doesn't do that.
>

The intention is that canonically equivalent sequences should render the 
same. I'm aware that MS doesn't do this in certain cases, as mentioned:

>          (b) a
>     handful of words where there's an <LV, T> sequence that uniscribe
>     doesn't support (it has no corresponding LVT syllable), but we
>     handle by decomposing to <L, V, T> and applying jamo features.

An example of this is <U+B4C0,U+11F0>, where uniscribe (using Malgun 
Gothic) renders the two default, unshaped glyphs for U+B4C0 (an LV 
syllable) and U+11F0 (a trailing jamo) separately, while harfbuzz 
decomposes U+B4C0 into separate leading- and vowel-jamo glyphs and then 
applies ljmo/vjmo/tjmo features so that the three jamos are properly 
composed into a single syllable block.

Thus, with harfbuzz the two sequences
   <U+B4C0,U+11F0>
   <U+1103,U+1172,U+11F0>
render the same. As I understand things, the Korean standard says the 
former spelling should not be used, but IMO that cannot override the 
fact that the Unicode standard defines them as canonically equivalent, 
so rendering them identically is correct.

What the patched harfbuzz still -doesn't- implement is shaping "spelled 
out" versions of Old Hangul sequences with multiple L, V and/or T jamos. 
The old MS Hangul spec gave an example where the leading jamo now 
encoded at U+A972 (CHOSEONG PIEUP-SIOS-THIEUTH) was encoded as the 
sequence <U+1107,U+1109,U+1110> and then composed (and similarly for the 
V and T jamos), so that a complete syllable was composed from a sequence 
of the form <L, L, L, V, V, V, T, T, T>.

I experimented with a patch that would support this, and the result 
looked OK (to my un-Korean eyes) when using the UnBatang font (not so 
good with Malgun Gothic). However, this is not canonically equivalent, 
and my understanding is that with Unicode having added all the complex 
jamos, there is no longer any real requirement or desire to support such 
sequences. So I haven't included this.

JK



More information about the HarfBuzz mailing list