[HarfBuzz] hangul shaper patches
Jonathan Kew
jfkthame at googlemail.com
Mon Jan 20 00:12:03 PST 2014
On 20/1/14 02:21, Roozbeh Pournader wrote:
> Jonathan,
>
> I was wondering if the new patches would have all the canonically
> equivalent characters sequences rendered the same way. Microsoft people
> have said publicly that their Hangul shaper intentionally doesn't do that.
>
The intention is that canonically equivalent sequences should render the
same. I'm aware that MS doesn't do this in certain cases, as mentioned:
> (b) a
> handful of words where there's an <LV, T> sequence that uniscribe
> doesn't support (it has no corresponding LVT syllable), but we
> handle by decomposing to <L, V, T> and applying jamo features.
An example of this is <U+B4C0,U+11F0>, where uniscribe (using Malgun
Gothic) renders the two default, unshaped glyphs for U+B4C0 (an LV
syllable) and U+11F0 (a trailing jamo) separately, while harfbuzz
decomposes U+B4C0 into separate leading- and vowel-jamo glyphs and then
applies ljmo/vjmo/tjmo features so that the three jamos are properly
composed into a single syllable block.
Thus, with harfbuzz the two sequences
<U+B4C0,U+11F0>
<U+1103,U+1172,U+11F0>
render the same. As I understand things, the Korean standard says the
former spelling should not be used, but IMO that cannot override the
fact that the Unicode standard defines them as canonically equivalent,
so rendering them identically is correct.
What the patched harfbuzz still -doesn't- implement is shaping "spelled
out" versions of Old Hangul sequences with multiple L, V and/or T jamos.
The old MS Hangul spec gave an example where the leading jamo now
encoded at U+A972 (CHOSEONG PIEUP-SIOS-THIEUTH) was encoded as the
sequence <U+1107,U+1109,U+1110> and then composed (and similarly for the
V and T jamos), so that a complete syllable was composed from a sequence
of the form <L, L, L, V, V, V, T, T, T>.
I experimented with a patch that would support this, and the result
looked OK (to my un-Korean eyes) when using the UnBatang font (not so
good with Malgun Gothic). However, this is not canonically equivalent,
and my understanding is that with Unicode having added all the complex
jamos, there is no longer any real requirement or desire to support such
sequences. So I haven't included this.
JK
More information about the HarfBuzz
mailing list