[HarfBuzz] Hangul GSUB features

Dohyun Kim nomosnomos at gmail.com
Fri Jan 24 19:14:16 PST 2014


2014/1/25  <mskala at ansuz.sooke.bc.ca>:
>
> These fonts are intended to be able to typeset the full range of hangul
> defined in Unicode - including both the precomposed syllable code points and
> the (basic and extended) individual jamo.  So I want to be able to
> typeset all these code point sequences, and typeset them identically, using
> a single glyph that is a precomposed syllable:
>
>    1. U+1100 U+1161 U+11B7 (choseong-kiyeok jungseong-a jongseong-mieum)
>    2. U+AC00 U+11B7        (syllable-ga jongseong-mieum)
>    3. U+AC10               (syllable-gam)
>
> I'm not an expert on Unicode canonical equivalence, but I believe these
> three sequences are canonically equivalent to each other under the rules
> in sections 3.7 and 3.12 of the current Unicode standard
> (http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf).  Sequence 1 is the
> canonical decomposition of all three.  If I'm reading the discussion of the
> last few days correctly, it sounds like we're all more or less in agreement
> on that.
>
> I would also like to be able to typeset the extended compound jamo as nicely
> as possible.  For instance, I would like these two sequences to both be
> typeset with a single glyph that is a precomposed lead jamo cluster, to be
> overlaid with additional glyphs for subsequent code points that would
> describe the vowel and tail of the syllable:
>
>    4. U+1107 U+1109 U+1110 (choseong-pieup choseong-sios choseong-thieuth)
>    5. U+A972               (choseong-pieup-sios-thieuth)
>
> Exactly which glyph is used for these two sequences should be
> context-sensitive, determined by the following vowel and presence or absence
> of a tail.  It looks to me like these may not be canonically equivalent
> under Unicode; U+A972 does not canonically decompose, and I don't think
> there is such a thing as canonical composition of jamo.  Nonetheless it
> certainly appears that they should be understood as the same text,
> describing the same fragment of a syllable.

Even after removing syllable decomposition table in ccmp and
recompositon table in liga, jieubsida fonts will still have an issue
regarding the case 4, that is multiple jamos to be composed to single
jamo. If current hangul shaper could be modified so that it would not
touch upon this sort of jamo sequence, then everybody would be happy.

In other words, I propose that hangul shaper should not compose jamo
sequence as follows to precomposed syllable even though it is
composable:

        <L, L, V> should not be compsed to <L, LV> but leave it as untouched.
        <L, V, V> should not be composed to <LV, V> but leave it as untouched.
        <L, L, V, T> should not be composed to <L, LVT> but leave it
as untouched.
        <L, V, T, T> should not be composed to <LVT, T> but leave it
as untouched.

Can this request be easily imported to current hangul shaper? If
positive, then the remaining process could be done with opentype gsub
features such as ccmp, calt, or other "global" features, even if *jmo
feaures are not applicable. If the answer is negative, I don't care
that much however, as these jamo sequences are not valid under the
domestic standard KS X 1026-1.

Best,
-- 
Dohyun Kim
Seoul, Republic of Korea


More information about the HarfBuzz mailing list