[HarfBuzz] hangul shaper patches

Dohyun Kim nomosnomos at gmail.com
Mon Jan 20 15:45:07 PST 2014


2014/1/21 Jonathan Kew <jfkthame at googlemail.com>:
> On 20/1/14 15:26, Dohyun Kim wrote:
>
>>
>> I just have tested this kind of input string and the result is a
>> little disappointing:
>> Input string <U+1107,U+1109,U+1110,U+1161> does not rendered well. The
>> output of current (patched) harfbuzz with UnBatang font is
>> [uni1121=0+1024|uniD0C0=2+1024], the expected output being
>> [uniA972.xxxx|uni1161.xxxx]
>
>
> IIRC, <U+1107,U+1109,U+1110> is *not* canonically equivalent to U+A972, even
> though it may be perfectly logical to spell the complex jamo as a sequence
> of simpler jamo letters.
>
>
>>
>> The reason seems to be that we are currently applying "ccmp" opentype
>> feature too late. If "ccmp" feature could be applied before the
>> process of hangul shaper, the issue would disappear.
>
>
> Currently, this example fails because the pair <U+1110,U+1161> gets composed
> to U+D0C0 during the preprocess_text function, and so by the time any
> OpenType features are applied, it's too late.
>
> Fixing this is tricky within the current structure of the shaper, as the
> main hangul shaper function needs to run before we map the Unicode
> characters to glyphs, but the ccmp feature needs to run after the default
> char-to-glyph mapping has been done.
>
> Is this actually important? Note that Windows behaves similarly, and so data
> that has "spelled-out" representations of complex jamos won't work there
> either. AIUI, the recommended practice is to use the precomposed Unicode
> characters such as U+A972 directly - and because these do *not* have
> decompositions, mixing the two forms will lead to confusion and problems for
> users. Perhaps it's better that the non-preferred spelling does not render
> "correctly".
>

Though <U+1107,U+1109,U+1110> is not allowed in KS X 1026-1, IMHO it
is allowed in the unicode standard. Please see
http://www.unicode.org/reports/tr29/tr29-23.html#Standard_Korean_Syllables
.

Until three or four years ago, jamo sequence such as
<U+1107,U+1109,U+1110> was inevitable. But we do not use that sort of
jamo sequence any more. So the issue we discuss is only for old Hangul
documents written down several years ago and not revised yet.

Yes, I agree with Jonathan. Supporting those old documents is not that
important. Anyhow, those documents should be revised sooner of later
if they are not yet revised. Also I admit that the unicode standard
linked above is, for the most part, a legacy of past and only for
backward compatibility.

So I am totally satisfied with the patches. Many thanks to Jonathan,
Behdad, and everybody for your concern for Hangul shaper.

Best Regards,
-- 
Dohyun Kim
Seoul, Republic of Korea


More information about the HarfBuzz mailing list