[HarfBuzz] an issue regarding discrepancy between Korean and Unicode standards

Dohyun Kim nomosnomos at gmail.com
Thu Apr 4 17:42:26 PDT 2013


2013/4/5 Behdad Esfahbod <behdad at behdad.org>:
> Hi,
>
> Can you please tell me what the desired rendering of that sequence with malgun
> is?  With Uniscribe I get three disjoint glyphs.  Is *that* the desired rendering?
>

You need malgun.ttf from *Windows 8*.  The version of the font should
be 6.2 or higher.  I guess you have old version of malgun.

Then, we can test with UnBatang which is available at
  http://kldp.net/frs/download.php/4706/UnBatang_0613.ttf
This font behaves almost the same as malgun.ttf.

Attached is a sample result file using UnBatang.  Right side of the
result is correct one.  The command line to get the left side was:
  hb-view --output-file=gang1.pdf --script=hang UnBatang_0613.ttf < gang.txt
and that of right side was:
  hb-view --shaper=coretext --output-file=gang2.pdf --script=hang
UnBatang_0613.ttf < gang.txt

As I am on Mac, uniscribe is not available.  But I guess that
uniscribe will give us the same result as coretext.

Regards,

>
> On 13-03-20 11:03 PM, Dohyun Kim wrote:
>> Hi,
>>
>> When a sample input string, say "U+1100 U+1161 U+11F0", is processed
>> by current version of harfbuzz with some fonts, eg. malgun.ttf bundled
>> with windows 8, we get something like "U+AC00 U+11F0", which is not
>> good in its visual result.
>>
>> The reason is that there is discrepancy between Korean industrial
>> standad (KS X 1026-1: 2007) and Unicode normalization rule.
>> Malgun.ttf observes Korean standard only and does not care about
>> international unicode standard for normalizaiton.  FYI, an English
>> translation of KS X 1026-1 is available at
>> ftp://std.dkuug.dk/ftp.anonymous/JTC1/SC2/WG2/docs/n3422.pdf.
>>
>> Normalization done by current harfbuzz is of course compliant with
>> unicode standard.  "U+AC00 U+11F0", ie. precomposed character in
>> Hangul syllable block followed by trailing consonant Jamo letter, is
>> perfectly legal and is canonically identical to "U+1100 U+1161
>> U+11F0".  According to KS X 1026-1, however, this should not occur.
>> Section 5.3 of the Korean standard says: "A Wanseong syllable
>> block(U+AC00..U+D7A3) cannot be recomposed with Johab Hangul
>> letters(U+1100..U+11FF U+A960..U+A97C U+D7B0..U+D7FB) to represent
>> another Hangul syllable block."  See also section 6.4 of this
>> standard.
>>
>> I have hesitated about posting this issue as harfbuzz is observing
>> unicode normalization rule.  We cannot say it is a bug, and many other
>> libraries including glib and icu is doing the same as harfbuzz.  I
>> believe that font developers should care about unicode standard as
>> well, which some fonts (jieupsida and hcr-lvt) are already supporting.
>>  But as there are other fonts (malgun.ttf and unbatang.ttf) which do
>> not give us good result with current harfbuzz, I am now raising this
>> issue.  Above all, malgun.ttf is now the default Hangul font for the
>> most widely used OS here in Korea.  I have little knowledge about
>> programming languages, but the Korean standard mentioned above has
>> some sample code in its appendix.
>>

--
Dohyun Kim
College of Law, Dongguk University
Seoul, Republic of Korea
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gang2-nup.pdf
Type: application/pdf
Size: 5352 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20130405/d2275d09/attachment.pdf>


More information about the HarfBuzz mailing list