[HarfBuzz] Tai Tham NGA, SAKOT is not Kinzi

Theppitak Karoonboonyanan thep at linux.thai.net
Wed Apr 3 23:10:53 PDT 2013


On Thu, Apr 4, 2013 at 1:10 AM, Richard Wordingham
<richard.wordingham at ntlworld.com> wrote:
> On Wed, 3 Apr 2013 11:01:48 +0700
> Theppitak Karoonboonyanan <thep at linux.thai.net> wrote:
>
>> On Wed, Apr 3, 2013 at 4:53 AM, Richard Wordingham
>> <richard.wordingham at ntlworld.com> wrote:
>> > On 7 January 2013, Behdad Esfahbod replied to Theppitak
>> > Karoonboonyaan:
>> >
>> >>> - Final NGA (U+1A59) with virama following is not reordered after
>> >>>   the next base consonant (at the end of line 4).
>
>> So, my proposal for Lao Tham is to apply above rule with U+1A58
>> instead.
>
> Of course, this rule should not be applied for Tai Khuen, at least, not
> for the modern style.
>
>> And Lao Tham fonts can then provide two identical glyphs for U+1A58
>> and U+1A59, with only the former affected by the rule.
>
>> > A noteworthy example is Northern Thai _tanglai_ <LOW TA, MAI KANG
>> > LAI, SIGN LA, AA, SAKOT, YA> 'all', 'many', where MAI KANG LAI
>> > almost always starts above the initial consonant.  This may be
>> > because SIGN LA is part of the same syllable as LOW TA.  The
>> > textbook showing it between consonants shows it, in this case,
>> > between SIGN LA and the vowel AA.
>
> Correction: For SIGN LA, read SAKOT, LA.

Is that recommended by Unicode? Why using <SAKOT, LA> when
MEDIAL LA is available? Especially for Lao Tham, there are two variants
of MEDIAL LA (U+1A56 SIGN MEDIAL LA, and the other similar to
U+1A57 SIGN LA TANG LAI). So, one should be explicit which one to use,
the situation similar to U+1A63 VOWEL AA and U+1A64 VOWEL TALL AA.

>> > Is there a problem with supporting this variety in positioning?
>
>> Can it be distinguished with the presence of following SAKOT?
>
>> For example:
>
>> <SA, MAI KANG LAI, LOW KHA, VOWEL E, VOWEL AA>
>> = MAI KANG LAI above SA
>
>> <SA, MAI KANG LAI, SAKOT, LOW KHA, VOWEL E, VOWEL AA>
>> = MAI KANG LAI above LOW KHA
>
> I believe a Tai Khuen font would render them as:
>
> 1) SA, MAI KANG LAI, E, LOG KHA, AA
> 2) E, SA.KHA, MAI KANG LAI, AA
>
> where SA.KHA is a vertical stack.

What if it's preprocessed by the rendering engine?

> Also note that _tanglai_ is <LOW TA, MAI KANG LAI, SAKOT, LA, AA,
> SAKOT, YA>, but the LA is subscript.  (I should have typed it first and
> then written out the code points.  It might exist with SIGN LA instead
> of SAKOT, LA, but I haven't noticed it written that way.)

This is interesting. Which shape would be resulted from <SAKOT, LA>,
MEDIAL LA or LA TANG LAI? What about other words with medial LA?

> So that idea would not work.
>
> I'm not sure that we should be encoding writings with MAI KANG LAI
> on the first and on the second consonant differently any more than
> we encode _dam_ differently depending on whether the MAI KANG is
> written on the DA or the AA.  In either case, we encode it <DA, AA, MAI
> KANG> and leave it to the renderer to decide, do we not?  (Obviously
> the position is significant in contractions like boomaa <BA, MAI KANG,
> TONE-1, SAKOT, MA, AA>.)

Now I wonder how far MAI KANG & MAI KANG LAI is shifted to the left in
Khuen/Lanna. You compared it with MAI KANG on vowel AM.
For Lao Tham, the shift is not as far as the position on "boomaa".
It's just shifted at most to the middle between the consonant and vowel AA.
But for "booma", it's centered right above BA.

So, the shift for MAI KANG is just a matter of style in this case, which
can be handled by GPOS or so. The point is that it still appears near
the vowel AA. And that made me question the degree of the shift in
Khuen/Lanna, esp. for MAI KANG LAI.

>  I must admit I had thought that a shaper
> would have problems with words like _tam_ <TA, TONE-1, AA, MAI KANG>
> (optionally swapping TONE-1 and MAI KANG) and would need special
> processing for words like _luup_ <RA, UU, SAKOT, BA>, which looks just
> like the Sanskrit fragment <RA, SAKOT, BA, UU>.

The former, if it's really to be done for Tham, is similar to the case of Thai
SARA AM (U+0E33), which has already been handled by rendering engines.
The latter is somewhat different, but is within similar complexity, I suppose.

Regards,
--
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/



More information about the HarfBuzz mailing list