[HarfBuzz] Tai Tham NGA, SAKOT is not Kinzi

Theppitak Karoonboonyanan thep at linux.thai.net
Sun Apr 21 08:07:53 PDT 2013


 On Sat, Apr 13, 2013 at 6:34 AM, Richard Wordingham
<richard.wordingham at ntlworld.com> wrote:
> On Fri, 12 Apr 2013 15:04:52 +0700
> Theppitak Karoonboonyanan <thep at linux.thai.net> wrote:
>
>> On Thu, Apr 11, 2013 at 6:08 AM, Richard Wordingham
>> <richard.wordingham at ntlworld.com> wrote:
>>> On Wed, 10 Apr 2013 13:08:06 +0700
>>> Theppitak Karoonboonyanan <thep at linux.thai.net> wrote:
>
>> I'd like to see some sample, for education's sake. Especially, is it
>> explicitly explained in words?
>
> I've uploaded some examples of mai kang lai associated with the
> preceding consonant at
> http://homepage.ntlworld.com/richard.wordingham/lanna/maikanglai.pdf .
> There is not much explanation.

Thank you for the doc.

> The Tai Khuen sample happens to include an example of _ruup_ written
> with <SAKOT, BA> below and to the right of RA and with UU to the right
> of <SAKOT, BA>.

Yes, I think I have seen similar samples in Lao manuscripts, too.

>> >> If so, and if "tanglai" invalidates the use of kinzi model, how
>> >> about having the rendering engine preprocess it without SAKOT? For
>> >> example: <SA, MAI KANG LAI, LOW KHA, E, AA> ->
>> >> <SA, E, LOW KHA, MAI KANG LAI, AA>.
>> >
>> > If a font is to replicate the style of the handbook, can a GPOS
>> > table effectively rearrange the latter back to look like SA, MAI
>> > KANG LAI, E, LOW KHA, AA?
>>
>> The same question applies to the other style as well. Can the GPOS
>> do the same to shift it to second consonant on the lack of
>> preprocessing?
>
> I believe mai kang lai should normally be positioned using a 'mark to
> base' attachment.  That requires that the base precede the mark in glyph
> order.  So no, I believe GPOS cannot shift it to the second
> consonant.

Right. And it may even exceeds GSUB practical limit to reorder so.
So, it could not be handled in the font alone, then.

> If GPOS works on the whole string, rather than just a single syllable,
> I think GPOS can undo the rearrangement.  My reading of the code is
> that GPOS works on the whole string.

I have no idea about this.

> I am therefore leaning to the view that the rendering engine should
> offer the capability to move MAI KANG LAI to after the next
> consonant or consonant symbol.  This supports _tanglai_ with mai
> kang lai.  This does not seem to fit in well with the form of the
> rearrangement rules.

Given the constraints we have, this might be needed. But please read
further.

It's too bad that SAKOT + LA is used in "tanglai". The problem should
have been avoided had the definitions of SAKOT + LA and MEDIAL LA
been swapped.

>> In fact, my font also features some complicated GSUB rules to handle
>> this, which I try to get rid of by the aids of rendering engine.
>> Meanwhile, doing so would cause the other school to bear the same
>> load instead.
>
> I think the rearrangement should be optional.  A font should signal to
> the rendering engine whether rearrangement should be performed.
>
>> The question is, which one should be the default, and which one
>> should be the exception?
>
> With my signalling idea, neither is the default.

This could be the solution we're seeking. But how should the font do the
signalling?

Another question is how to fall back for rendering engines that lack
Tai Tham support? Note that the SAKOT-less encoding scheme would
break my current implementation, as to be said below.

>> As we discuss so far, only some parts of
>> Lanna (which I haven't seen myself yet) advocates the non-shifting
>> Mai Kang Lai, while Khuen, Lao, and the other parts of Lanna itself,
>> all advocate the shifting version. (That's why it's called "Mai Kang
>> Lai", isn't it?)
>
> Khuen has mai kang lai in the same syllable as the first
> consonant; Lao in the same syllable as the second syllable.

This makes me get back to read to your thread starting post more carefully.
Yeah, you said Mai Kang Lai is shifted right to the midway between the
first and the second consonant. I read that as "shifting school".
Probably, we should check how "sangkho" is written in Khuen, then.

For Lanna, my friend from Chiang Mai has shown me a sample from a
book by อ. มณี พยอมยงค์, where Mai Kang Lai is clearly written on the
second consonant:
  https://dl.dropboxusercontent.com/u/12266813/TaiTham/manee_1.jpeg

This confirms that Lanna has two separate styles of writing.

>  Doesn't
> the name just come from the shape?  'Eel-like mai kang' would not be
> too bad an English name.

I don't think so.
- Eel is not usally called "Lai" alone. It's rather "Pla Lai" (literally meaning
  "running fish").
- "Pla Lai" is Central Thai language, not Northern or North-Eastern Thai.
  Eel is called "Yian" or "Pla Yian" in Northern Thai, or "IAN" in Lao and
  North-Eastern Thai. If the name were from the shape, it should have been
  called "Mai Kang Yian".
- "Mai Kang Lai" is called "Mai Ang Lan" in Lao, where "Lan" means
  "running".
- The tutorial clip I mentioned does explain how "Mai Kang Lai" gets its
  name.

>> Meanwhile, Lanna seems to be more dynamic in styles, which makes
>> things more complicated. For Lao, there is only one school. So, it's
>> less problematic to bear the complication. But what about the
>> shifting school of Khuen/Lanna?

I withdraw my claim that it would be less problematic to let Lao shift
Mai Kang Lai in the font. I've experimented with the SAKOT-less
encoding scheme and I've got boundary problem with some words
like <SA, MAI KANG LAI, LOW KHA, RA, HIGH TA, NA, HIGH PA,
NA, AA, MA> (สงฺฆรตนปณาม). With SAKOT-less encoding scheme,
Mai Kang Lai continues being shifted over following consonants.
But as the rule comprises multiple stages, the shift is incomplete
and causes duplicates of Mai Kang Lai along the rendered text.
Getting over this would be tricky.

So, it would be less problematic only if SAKOT is used to mark
the stacking position. And, in fact, if you look at Thai transcription
of the word, PHINTHU is used at the very position. It logically kills the
inherent vowel of Pali NGA.

In fact, in Lao/Esaan Tham tutorials, it's explained like that. Mai Kang Lai
(or Lao "Mai Ang Lan") represents Pali NGA being "subjoined" by the base
character.

So, the complication for fallback caused by the SAKOT-less scheme
makes me reconsider the use of SAKOT.

The ideal case might be to redefine U+1A56 to represent the spacing
MEDIAL LA (currently represented by SAKOT + LA), and SAKOT + LA
to mean the non-spacing one. Doing so would eliminate the "tanglai"
exception.

But amending the standard is not easy. Another possibility is to exclude
<MAI KANG LAI, SAKOT, LA> from the rule. This should not have bad
effect, because NGA + SAKOT + LA (งฺล) combination does not exist
in Pali.

Regarding the question about multiple forms of the same word,
it's already true. For example, "sangkho" can be written either:
- <HIGH SA, MAI KANG, LOW KHA, E, AA>
- <HIGH SA, NGA, SAKOT, LOW KHA, E, AA>
- <HIGH SA, MAI KANG LAI, [SAKOT,] LOW KHA, E, AA>
What if we accept that the last one can be split into 2 different forms?
Just like the multiple forms of "tanglai", it should not be a surprise if there
exists a book that explains the several ways to write "sangkho" in Lanna
by considering the shifting and non-shifting Mai Kang Lai as different forms.

If so, I think it would make things a lot simpler and better defined.

>> Another way, which I think most users tend to abuse already, is to
>> encode it in semi-visual order, such as <HIGH SA, LOW KHA, MAI KANG
>> LAI, E, AA>. Can we accept that?
>
> I hate what that does to the spelling of Pali.  I don't think we want
> confusing variation in its spelling.

Exactly. We should avoid this.

>> > Going through the Tai Khuen passages in that book, I found a case
>> > (on p.150) where <HIGH SA, MAI KANG LAI, LOW KHA> had a line break
>> > before the LOW KHA.  That may have been a mistake, and I'm not sure
>> > how significant it is to rendering.
>>
>> I have only seen counter-examples in Lao Tham. For example, the word
>> <MA, U, MAI KANG LAI, LOW KA, U, RA> is seen in a palm leaf
>> manuscript to break between <U> and <MAI KANG LAI>, with <MAI KANG
>> LAI> over <LOW KA> on the next line.
>
> This difference in line-breaking is exactly what I would expect.  For
> applications, the fallback rule would be to not break on either side of
> MAI KANG LAI.

Yes, I think so.

Regards,
--
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/



More information about the HarfBuzz mailing list