[HarfBuzz] Tai Tham NGA, SAKOT is not Kinzi

Richard Wordingham richard.wordingham at ntlworld.com
Wed May 1 00:39:41 PDT 2013


On Tue, 30 Apr 2013 14:31:29 +0700
Theppitak Karoonboonyanan <thep at linux.thai.net> wrote:

> So, assuming we choose the SAKOT-less encoding scheme, let me
> summarize the issue for Behdad:
 
> U+1A58 TAI THAM SIGN MAI KANG LAI has a special property.
> There are two styles of rendering.
 
> 1. In Lao Tham and traditional Lanna, it's placed on the next base
> consonant, like Myanmar Kinzi. So, the word "ᩈᩘᨥᩮᩣ" <U+1A48 HIGH SA,
> U+1A58 MAI KANG LAI, U+1A25 LOW KHA, U+1A6E VOWEL E,
> U+1A63 VOWEL AA> should be rendered as:
 
> <HIGH SA, VOWEL E, LOW KHA, MAI KANG LAI, VOWEL AA>
 
> 2. In Khuen and modern Lanna, it's placed on the first base consonant,
> although a little bit shifted to the right. So, the same word above
> should be rendered as:
 
> <HIGH SA, MAI KANG LAI, VOWEL E, LOW KHA, VOWEL AA>
 
> That is, only the leading vowel is reordered.
 
> Therefore, we need rendering engine to reorder MAI KANG LAI when
> certain feature exists in the font. Richard suggests two choices:
> 'pref' and 'rphf'.

> The reordering should be similar to Myanmar Kinzi. The substituted
> Kinzi glyph should be place next to the next base consonant, or a
> dotted circle if no valid base consonant exists. The reordering may
> take place before the leading vowel is reordered.

We also need to say that it should ignore SAKOT when reordering.  I
hope that suffices.  I think the reordering rule that we want is that
MAI KANG LAI is positioned after the next consonant or consonant
sign. Thus <LOW TA, MAI KANG LAI, SAKOT, LA, AA, SAKOT, YA> would be
reordered to <LOW TA, SAKOT, LA, MAI KANG LAI, AA, SAKOT, YA> and <LOW
TA, MAI KANG LAI, LA TANG LAI, AA, SAKOT, YA> would be reordered to
<LOW TA, LA TANG LAI, MAI KANG LAI, AA, SAKOT, YA>. 

> However, it may require extra check so that the reordering should not
> happen if there is an upper vowel on the next base consonant, for
> proper rendering of words like "ᩋᩘᨠᩕᩥ᩠ᩈ"
> <A, MAI KANG LAI, HIGH KA, MEDIAL RA, VOWEL I, SAKOT, HIGH SA>
> in Tai Khuen.

(a) 'May' does not tell the shaper what to do.

(b) That word is not a problem for a Tai Khuen font.  How should a Lao
Tham font handle it?  How would it handle the Pali word _sankilesa_
spelt with MAI KANG LAI - I would expect text written for a Tai Khuen
font to use MAI KANG LAI.

(c) The problem here is that some styles, such as that of the
Maefahluang dictionary of Northern Thai (MFL), place MA KANG LAI on the
first consonant when the second has a vowel above (SIGN I, SIGN II) or
MEDIAL RA.  Now the MFL may be unusual; it uses MAI KANG LAI within a
word where on general Indic principles (or is it just European
conventions?) I would expect MAI KANG (= anusvara). Some examples of MAI
KANG LAI being on the first syllable because the second is graphically
unsuitable are:

(i) ᩁᩘᩈᩦ <RA, MAI KANG LAI, SA, SIGN II> 'ray of light'
(ii) ᩈᩘᨠᩕᩣᨶ᩠ᨲᩴ <HIGH SA, MAI KANG LAI, HIGH KA, MEDIAL RA, AA, NA,
SAKOT, HIGH TA, RA HAAM> 'songkran'

Do we need the shaper to cope with these exceptions?  I'm still some way
from being ready to test the ability of GPOS to undo shaping.  The sort
of GPOS rule I have in mind is:

Context (lookup type 7): consonant consonant <MAI KANG LAI>
Lookup for context: At position 0, mark to base (lookup type 4),
skipping bases and other marks, to position MAI KANG LAI on the
consonant.

In example (i) above, rearrangement would deliver <g(RA), g(SA), g(MAI
KANG LAI), g(SIGN II)>, and we need to be able to attach MAI KANG LAI
to RA.  I don't know if this sort of rule works.

> As a side note, another choice which is not favored by the majority
> is to encode the different spellings differently, by using SAKOT
> after MAI KANG LAI when it requires the reordering. So, the shifting
> version of the sample word avobe would be:
> <HIGH SA, MAI KANG LAI, SAKOT, LOW KHA, VOWEL E, VOWEL AA>
> while the non-shifting version would be:
> <HIGH SA, MAI KANG LAI, LOW KHA, VOWEL E, VOWEL AA>.
> Just FYI, it's not chosen by the majority, though.

It doesn't work - _tanglai_!

Richard.



More information about the HarfBuzz mailing list