[HarfBuzz] Tai Tham NGA, SAKOT is not Kinzi

Theppitak Karoonboonyanan thep at linux.thai.net
Wed May 1 01:50:54 PDT 2013


On Wed, May 1, 2013 at 2:39 PM, Richard Wordingham
<richard.wordingham at ntlworld.com> wrote:
> On Tue, 30 Apr 2013 14:31:29 +0700
> Theppitak Karoonboonyanan <thep at linux.thai.net> wrote:
>
>> So, assuming we choose the SAKOT-less encoding scheme, let me
>> summarize the issue for Behdad:
>
>> U+1A58 TAI THAM SIGN MAI KANG LAI has a special property.
>> There are two styles of rendering.
>
>> 1. In Lao Tham and traditional Lanna, it's placed on the next base
>> consonant, like Myanmar Kinzi. So, the word "ᩈᩘᨥᩮᩣ" <U+1A48 HIGH SA,
>> U+1A58 MAI KANG LAI, U+1A25 LOW KHA, U+1A6E VOWEL E,
>> U+1A63 VOWEL AA> should be rendered as:
>
>> <HIGH SA, VOWEL E, LOW KHA, MAI KANG LAI, VOWEL AA>
>
>> 2. In Khuen and modern Lanna, it's placed on the first base consonant,
>> although a little bit shifted to the right. So, the same word above
>> should be rendered as:
>
>> <HIGH SA, MAI KANG LAI, VOWEL E, LOW KHA, VOWEL AA>
>
>> That is, only the leading vowel is reordered.
>
>> Therefore, we need rendering engine to reorder MAI KANG LAI when
>> certain feature exists in the font. Richard suggests two choices:
>> 'pref' and 'rphf'.
>
>> The reordering should be similar to Myanmar Kinzi. The substituted
>> Kinzi glyph should be place next to the next base consonant, or a
>> dotted circle if no valid base consonant exists. The reordering may
>> take place before the leading vowel is reordered.
>
> We also need to say that it should ignore SAKOT when reordering.  I
> hope that suffices.  I think the reordering rule that we want is that
> MAI KANG LAI is positioned after the next consonant or consonant
> sign. Thus <LOW TA, MAI KANG LAI, SAKOT, LA, AA, SAKOT, YA> would be
> reordered to <LOW TA, SAKOT, LA, MAI KANG LAI, AA, SAKOT, YA> and <LOW
> TA, MAI KANG LAI, LA TANG LAI, AA, SAKOT, YA> would be reordered to
> <LOW TA, LA TANG LAI, MAI KANG LAI, AA, SAKOT, YA>.

Seconded, provided that SAKOT-less scheme is chosen.

>> However, it may require extra check so that the reordering should not
>> happen if there is an upper vowel on the next base consonant, for
>> proper rendering of words like "ᩋᩘᨠᩕᩥ᩠ᩈ"
>> <A, MAI KANG LAI, HIGH KA, MEDIAL RA, VOWEL I, SAKOT, HIGH SA>
>> in Tai Khuen.
>
> (a) 'May' does not tell the shaper what to do.

I use 'may' because the exceptions only exist in non-shifting languages,
say, Khuen, and arguably Lanna. For Lao Tham, Mai Ang Lan or Mai Kang
Lai usage is quite limited. And the other uses of the same sign as Final
NGA (such as in ᩈ᩠ᨿᩙ/เสียง, ᨿᩩᩙ/ยุง) should be encoded with U+1A59 instead.

The uncertainty of 'may' seems to only exist in Lanna, which has not
been examined with real evidence yet.

> (b) That word is not a problem for a Tai Khuen font.  How should a Lao
> Tham font handle it?

No need to handle it, because Mai Kang Lai is not that heavily used in Lao
Tham. It should rather be spelt as ᩋᩢ᩠ᨦᨠᩕᩥ᩠ᩈ instead.

>  How would it handle the Pali word _sankilesa_
> spelt with MAI KANG LAI - I would expect text written for a Tai Khuen
> font to use MAI KANG LAI.

The conflict like this is usually avoided in Lao Tham manuscripts by using
alternative spelling, probably ᩈᨦ᩠ᨠᩥᩃᩮᩈ. (I haven't found a real
sample of this yet,
just imply it from the tendency of other cases.)

> (c) The problem here is that some styles, such as that of the
> Maefahluang dictionary of Northern Thai (MFL), place MA KANG LAI on the
> first consonant when the second has a vowel above (SIGN I, SIGN II) or
> MEDIAL RA.  Now the MFL may be unusual; it uses MAI KANG LAI within a
> word where on general Indic principles (or is it just European
> conventions?) I would expect MAI KANG (= anusvara). Some examples of MAI
> KANG LAI being on the first syllable because the second is graphically
> unsuitable are:
>
> (i) ᩁᩘᩈᩦ <RA, MAI KANG LAI, SA, SIGN II> 'ray of light'
> (ii) ᩈᩘᨠᩕᩣᨶ᩠ᨲᩴ <HIGH SA, MAI KANG LAI, HIGH KA, MEDIAL RA, AA, NA,
> SAKOT, HIGH TA, RA HAAM> 'songkran'
>
> Do we need the shaper to cope with these exceptions?

The question probably applies to Lanna, not Khuen or Lao Tham.

(i) is usually spelt "ᩁᩢ᩠ᨦᩈᩦ" in Lao Tham manuscripts. I haven't found samples
for (ii) yet, but I tend to spell it "ᩈᩫ᩠ᨦᨠᩕᩣᨶ᩠ᨲ" or "ᩈᨦ᩠ᨠᩕᩣᨶ᩠ᨲ" from
what I have learnt.

For Khuen, it's quite clear that the font won't come with the triggering
feature.

So, the question is for Lanna. Do such spellings exist, and how are they
handled? If they exist, yes, the rendering engine 'must' handle it.

>  I'm still some way
> from being ready to test the ability of GPOS to undo shaping.  The sort
> of GPOS rule I have in mind is:
>
> Context (lookup type 7): consonant consonant <MAI KANG LAI>
> Lookup for context: At position 0, mark to base (lookup type 4),
> skipping bases and other marks, to position MAI KANG LAI on the
> consonant.
>
> In example (i) above, rearrangement would deliver <g(RA), g(SA), g(MAI
> KANG LAI), g(SIGN II)>, and we need to be able to attach MAI KANG LAI
> to RA.  I don't know if this sort of rule works.

I think having the rendering engine do it should be simpler.

>> As a side note, another choice which is not favored by the majority
>> is to encode the different spellings differently, by using SAKOT
>> after MAI KANG LAI when it requires the reordering. So, the shifting
>> version of the sample word avobe would be:
>> <HIGH SA, MAI KANG LAI, SAKOT, LOW KHA, VOWEL E, VOWEL AA>
>> while the non-shifting version would be:
>> <HIGH SA, MAI KANG LAI, LOW KHA, VOWEL E, VOWEL AA>.
>> Just FYI, it's not chosen by the majority, though.
>
> It doesn't work - _tanglai_!

There is workaround, such as excluding SAKOT LA from the rule, as the
งฺล combination does not exist in Pali anyways. Or better than that,
amend Unicode!

Regards,
--
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/



More information about the HarfBuzz mailing list