[HarfBuzz] Hangul Shaper (was Re: an issue regarding discrepancy between Korean and Unicode standards
Behdad Esfahbod
behdad at behdad.org
Tue Apr 9 13:29:24 PDT 2013
Hi,
Ok, what you describe sounds very close to the OpenType spec:
http://www.microsoft.com/typography/otfntdev/hangulot/
and what the ICU Layout Hangul shaper does.
The one part I don't understand is the section "Compose Old Hangul Jamo
combinations" under:
http://www.microsoft.com/typography/otfntdev/hangulot/shaping.htm
I can't make sense of that part, since Appendix B does not list what the jamos
compose to.
Please review those documents and share any insights you may have. I'll go
ahead with implementing a shaper then.
behdad
On 13-04-06 01:32 PM, Dohyun Kim wrote:
> 2013/4/6 Behdad Esfahbod <behdad at behdad.org>:
>> On 13-04-05 06:45 AM, Dohyun Kim wrote:
>>> 2013/4/5 Dohyun Kim <nomosnomos at gmail.com>:
>>>> Sorry for the noise.
>>>> I have booted on Windows machine and tested uniscribe a bit. My guess
>>>> on how uniscribe works on Hangul is:
>>>>
>>>> 1. decompose hangul syllables to jamos
>>>>
>>>> 2. compose single jamos to composite jamo as possible as can be
>>>> eg., U+1100 U+1100 => U+1101
>>>> Note: mapping table for this composition is available at
>>>> ftp://ktug.org/ktug/hcr-lvt/composejamotojamo.map
>>>>
>>>
>>> Well, after a bit more test, it turned out that this second process is
>>> not what uniscribe does. Sorry for my wrong information. I have
>>> guessed this on the basis of old unicode standard. Recently unicode
>>> also does not recommend to use multiple single jamos to get composite
>>> jamo.
>>>
>>> Instead, uniscribe inserts fillers (U+115F U+1160) around single
>>> lonely jamo which do not make up syllable block.
>>
>> Interesting. So, for a lone T jamo, both 115F and 1160 are inserted?
>
> Yes, when fillers are inserted. But actually uniscribe does not seem
> to insert fillers. Sorry for my immuture conclusion. Today I have
> downloaded harfbuzz win32 binary and tested some jamo texts using
> hb-shape. This utility gave me more accurate information than I could
> obtain with the naked eye. Contrary to my expectation, the output of
> hb-shape did not have any traces of fillers. So, it seems evident
> that uniscribe does not insert fillers. And it seems also evident
> that uniscribe sets boundaries between syllable blocks, so that
> multiple single jamos could not be concatenated to composite jamo.
>
> Let us suppose an input text <U+1100 U+AC00 U+11F0>. I guess what
> uniscribe does:
>
> 1. decompose syllables to jamos: we get <U+1100 U+1100 U+1161 U+11F0>
>
> 2. demarcate each syllable block by setting boundaries in-between: we
> get <U+1100 | U+1100 U+1161 U+11F0> where | means syllable boundary.
> Probably this is related to the so-called "cluster." Yesterday I
> misconceived this boundary (maybe ZWNJ but I am not sure) as a filler.
> BTW, according to the old standard, U+1100 U+1100 are concatenated to
> U+1101, so the result will be a single syllable block <U+1101 U+1161
> U+11F0>. Nowadays we do not need this jamo-to-jamo composition,
> because all the jamos known until today are now registerd since
> unicode version 5.2.
>
> 3. try to re-compose jamos to syllablle letter. But as our sample
> text matches the case of <L V OT>, nothing is converted.
>
> 4. apply font features: we get <U+1100 | U+1100.s U+1161.s U+11F0.s>
> where ".s" means sustituted glyph.
>
> As I said before, we Koreans do not input text like <U+AC00 U+11F0> in
> their practice. However, there remains some possibility that some
> applications or libaries do pass to harfbuzz some unicode-normailized
> text, in which case hafbuzz would give us incorrect result. So I
> changed my mind, and now I suggest an implementation of hangul shaper.
> It is not an urgent task, though; harfbuzz works quite well already.
> However, we want harfbuzz as perfect as possible.
>
> Regards,
>
>
>>>> 3. compose jamos to hangul syllable as possible as can be
>>>> Note: this process complies with KSC 1026-1. In other words, jamo
>>>> sequence <L V> in <L V OT> is *not* converted to LV, where L means
>>>> leading consonant, V means medial vowel, OT means *old* trailing
>>>> consonant (U+11C3..U+11FF U+D7CB..U+D7FB), and LV means Hangul
>>>> syllable equivalent to L V.
>>>>
>>>> 4. apply opentype layout features
>>>>
>>>> It is somewhat complicated but gives perfect result. It satisfies
>>>> both the Korean and Unicode standards. Nevertheless, what current
>>>> hafbuzz does is quite excellent as well and I am satisfied with it. I
>>>> am reporting just for reference.
>>>>
>
--
behdad
http://behdad.org/
More information about the HarfBuzz
mailing list