[HarfBuzz] an issue regarding discrepancy between Korean and Unicode standards
Behdad Esfahbod
behdad at behdad.org
Thu Apr 4 20:07:42 PDT 2013
On 13-04-04 08:42 PM, Dohyun Kim wrote:
> 2013/4/5 Behdad Esfahbod <behdad at behdad.org>:
>> Hi,
>>
>> Can you please tell me what the desired rendering of that sequence with malgun
>> is? With Uniscribe I get three disjoint glyphs. Is *that* the desired rendering?
>>
>
> You need malgun.ttf from *Windows 8*. The version of the font should
> be 6.2 or higher. I guess you have old version of malgun.
Got it. Pushed a fix to master.
Upon more testing, looks like Uniscribe always decomposes Hangul syllables to
their jamos. We currently don't. Is that desirable? In my Hangul corpus
that's very rare (only ten cases out of millions).
behdad
> Then, we can test with UnBatang which is available at
> http://kldp.net/frs/download.php/4706/UnBatang_0613.ttf
> This font behaves almost the same as malgun.ttf.
>
> Attached is a sample result file using UnBatang. Right side of the
> result is correct one. The command line to get the left side was:
> hb-view --output-file=gang1.pdf --script=hang UnBatang_0613.ttf < gang.txt
> and that of right side was:
> hb-view --shaper=coretext --output-file=gang2.pdf --script=hang
> UnBatang_0613.ttf < gang.txt
>
> As I am on Mac, uniscribe is not available. But I guess that
> uniscribe will give us the same result as coretext.
>
> Regards,
>
>>
>> On 13-03-20 11:03 PM, Dohyun Kim wrote:
>>> Hi,
>>>
>>> When a sample input string, say "U+1100 U+1161 U+11F0", is processed
>>> by current version of harfbuzz with some fonts, eg. malgun.ttf bundled
>>> with windows 8, we get something like "U+AC00 U+11F0", which is not
>>> good in its visual result.
>>>
>>> The reason is that there is discrepancy between Korean industrial
>>> standad (KS X 1026-1: 2007) and Unicode normalization rule.
>>> Malgun.ttf observes Korean standard only and does not care about
>>> international unicode standard for normalizaiton. FYI, an English
>>> translation of KS X 1026-1 is available at
>>> ftp://std.dkuug.dk/ftp.anonymous/JTC1/SC2/WG2/docs/n3422.pdf.
>>>
>>> Normalization done by current harfbuzz is of course compliant with
>>> unicode standard. "U+AC00 U+11F0", ie. precomposed character in
>>> Hangul syllable block followed by trailing consonant Jamo letter, is
>>> perfectly legal and is canonically identical to "U+1100 U+1161
>>> U+11F0". According to KS X 1026-1, however, this should not occur.
>>> Section 5.3 of the Korean standard says: "A Wanseong syllable
>>> block(U+AC00..U+D7A3) cannot be recomposed with Johab Hangul
>>> letters(U+1100..U+11FF U+A960..U+A97C U+D7B0..U+D7FB) to represent
>>> another Hangul syllable block." See also section 6.4 of this
>>> standard.
>>>
>>> I have hesitated about posting this issue as harfbuzz is observing
>>> unicode normalization rule. We cannot say it is a bug, and many other
>>> libraries including glib and icu is doing the same as harfbuzz. I
>>> believe that font developers should care about unicode standard as
>>> well, which some fonts (jieupsida and hcr-lvt) are already supporting.
>>> But as there are other fonts (malgun.ttf and unbatang.ttf) which do
>>> not give us good result with current harfbuzz, I am now raising this
>>> issue. Above all, malgun.ttf is now the default Hangul font for the
>>> most widely used OS here in Korea. I have little knowledge about
>>> programming languages, but the Korean standard mentioned above has
>>> some sample code in its appendix.
>>>
>
> --
> Dohyun Kim
> College of Law, Dongguk University
> Seoul, Republic of Korea
>
--
behdad
http://behdad.org/
More information about the HarfBuzz
mailing list