[HarfBuzz] Tai Tham / Lanna (iso15924="lana") shaping question

Andrew Cunningham lang.support at gmail.com
Wed May 23 20:58:01 PDT 2012


had a quick look the Unicode standard and at the original proposals for Tai Tham

One important point in the standard is that they make a distinction
bteween the Khmer model and the standard Indic model, haven't looked
specifically at Khmer so will need to unpack this idea more. The
standard says: "The encoding model for Tai Tham is more similar to the
Khmer coeng model than to the usual virama model".

The Unicode standard also makes a distinction between sub-joined
consonants and dependant consonant signs. Sub-joined consonants are
formed using the sequence <consonant> + SAKOT + <consonant> and the
subjoined consonant may be the start of a new syllable, although part
of a grapheme cluster it isn't necessarily part of the same syllable
as the proceeding part of the graheme cluster. So potentially some
interesting issues there with cursor insertion points.

Regarding dependent consonant signs, the spec says "Seven dependent
consonant signs occur. Two of these are used as medials: U+1A55 tai
tham consonant sign medial ra and U+1A56 tai tham consonant sign
medial la form clusters and immediately follow a consonant."

This would imply to me that Tai Tham dependent consonant signs are
sued in a similar way without virama/sakot, i.e. <consonant> +
<dep-consonant>

Just my two cents worth so far.

Will keep digging.

Section 14 of N3207R is usueful as it delves in much more detail into
consonant conjoining behaviours and notes that Tai Tham encoding model
is similar to Khmer and Myanmar using sakot + consonant as well as
using medial-consonants.

Andrew

On 24 May 2012 12:47, Behdad Esfahbod <behdad at behdad.org> wrote:
> On 05/23/2012 07:22 PM, Andrew Cunningham wrote:
>> I was wondering if myanmar , etc should be included in indic engine, or wether
>> it would be better to fork the indic engine rather than overloading it with
>> divergent rules?
>
> Well, we will see how it goes.  I definitely don't want a soup like the old
> Indic shaper I had to maintain in Pango.  That said, we have made design
> decisions early on in the new Indic shaper that make me more confident trying
> to shape a wider continuum of scripts using one code base.  We'll adjust the
> design / decision as we integrate more scripts.
>
> The nice thing this time is, we have an extensive test suite, so we can make
> changes confidently.
>
>
> behdad
>
>>
>> On Thursday, 24 May 2012, Behdad Esfahbod <behdad at behdad.org
>> <mailto:behdad at behdad.org>> wrote:
>>> On 05/23/2012 06:48 PM, Andrew Cunningham wrote:
>>>> I think what Ed is saying is that Tai Tham follows a similar model to Myanmar
>>>> rather than a pure Indic model, where you have a distinct medials vs subjoined
>>>> consonants wher subjoined consonants require a virama and medials don't
>>>
>>> I see.  Thanks for the clarification.
>>>
>>>> Par of a fundamental change between myanar in unicode 4.1 and 5.1
>>>
>>> Good to know.  I'll give HB a run on my Myanmar corpus and see if I can fix a
>>> few high-impact issues.
>>>
>>>> Will look at my sources to confirm for Tai Tham.
>>>
>>> Thanks,
>>> b
>>>
>>>> A.
>>>>
>>>> On Thursday, 24 May 2012, Behdad Esfahbod <behdad at behdad.org
>> <mailto:behdad at behdad.org>
>>>> <mailto:behdad at behdad.org <mailto:behdad at behdad.org>>> wrote:
>>>>> Hi Thep,
>>>>>
>>>>> Humm, the message from Ed hat you are replying to never made it to me or to
>>>>> the list.  Replies inline.
>>>>>
>>>>>
>>>>> On 05/23/2012 06:53 AM, Theppitak Karoonboonyanan wrote:
>>>>>> Hi, Ed, Behdad,
>>>>>>
>>>>>> On Sun, May 20, 2012 at 3:45 AM, Ed Trager <ed.trager at gmail.com
>> <mailto:ed.trager at gmail.com>
>>>> <mailto:ed.trager at gmail.com <mailto:ed.trager at gmail.com>>> wrote:
>>>>>>> On Fri, May 18, 2012 at 5:48 PM, Behdad Esfahbod <behdad at behdad.org
>> <mailto:behdad at behdad.org>
>>>> <mailto:behdad at behdad.org <mailto:behdad at behdad.org>>> wrote:
>>>>>>>> On 05/18/2012 04:02 PM, Ed Trager wrote:
>>>>>>>>>
>>>>>>>>> In Tai Tham, U+1A6E VOWEL SIGN E needs to be shifted all the way to
>>>>>>>>> the left so that the final visual appearance would be:
>>>>>>>>
>>>>>>>> Are you sure?  Without U+1A60 TAI THAM SIGN SAKOT before the subjoined
>>>>>>>> consonant?  Reading Unicode suggests that you need that sign betwee PA
>>>> and LA.
>>>>>>>
>>>>>>> For most subjoined consonants, yes, that's true.  But note in
>>>>>>> particular that U+1A56 MEDIAL LA and U+1A57 MEDIAL LA TANG LAI were
>>>>>>> encoded separately.  In the case of these two "LA" signs, I believe
>>>>>>> there are two reasons justifying the separate encoding:
>>>>>>>
>>>>>>> (1) These are variant forms of the same subjoined letter LA:
>>>>>>> apparently, there is no other good way to do it other than encoding
>>>>>>> both.
>>>>>>>
>>>>>>> (2) Both of these LA signs can be part of triple consonant clusters,
>>>>>>> i.e. "KLW" appears in the common word Thai / Tai word for banana,
>>>>>>> กล้วย, "klwy" .  In Tai Tham, both the L and the W appear as
>>>>>>> below-base stacked forms (and actually the "y" is also a subjoined
>>>>>>> form, but it's kind of hanging off the right side of the whole stack).
>>>>>
>>>>> I'm not questioning the separate encoding.  I don't care :-).  What I'm saying
>>>>> is that you need a SAKOT before them for them to be considered part of the
>>>>> same syllable according to the Indic OpenType spec and my implementation.
>>>>> Now, if you think Unicode intended these to subjoin without a SAKOT, then I
>>>>> like you to point me to documentation about that.
>>>>>
>>>>> If that is the case, we would need changes to the Indic machine.  Not
>>>>> impossible, but I first want to make sure that it is indeed the case.
>>>>>
>>>>> behdad
>>>>>
>>>>>
>>>>>
>>>>>>> There are some other separately-encoded subjoining consonant signs:
>>>>>>> U+1A5B, U+1A5C, U+1A5D, U+1A5E.
>>>>>>
>>>>>> Please also count U+1A55 (MEDIAL RA) in the rule, although it's not a
>>>>>> subjoined form.
>>>>>>
>>>>>> Regards,
>>>>>> -Thep.
>>>>> _______________________________________________
>>>>> HarfBuzz mailing list
>>>>> HarfBuzz at lists.freedesktop.org <mailto:HarfBuzz at lists.freedesktop.org>
>> <mailto:HarfBuzz at lists.freedesktop.org <mailto:HarfBuzz at lists.freedesktop.org>>
>>>>> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>>>>>
>>>>
>>>> --
>>>> Andrew Cunningham
>>>> Senior Project Manager, Research and Development
>>>> Vicnet
>>>> State Library of Victoria
>>>> Australia
>>>>
>>>> andrewc at vicnet.net.au <mailto:andrewc at vicnet.net.au>
>> <mailto:andrewc at vicnet.net.au <mailto:andrewc at vicnet.net.au>>
>>>> lang.support at gmail.com <mailto:lang.support at gmail.com>
>> <mailto:lang.support at gmail.com <mailto:lang.support at gmail.com>>
>>>
>>
>> --
>> Andrew Cunningham
>> Senior Project Manager, Research and Development
>> Vicnet
>> State Library of Victoria
>> Australia
>>
>> andrewc at vicnet.net.au <mailto:andrewc at vicnet.net.au>
>> lang.support at gmail.com <mailto:lang.support at gmail.com>



-- 
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
State Library of Victoria
Australia

andrewc at vicnet.net.au
lang.support at gmail.com



More information about the HarfBuzz mailing list