[HarfBuzz] Tai Tham / Lanna (iso15924="lana") shaping question

Behdad Esfahbod behdad at behdad.org
Wed May 23 19:47:33 PDT 2012


On 05/23/2012 07:22 PM, Andrew Cunningham wrote:
> I was wondering if myanmar , etc should be included in indic engine, or wether
> it would be better to fork the indic engine rather than overloading it with
> divergent rules?

Well, we will see how it goes.  I definitely don't want a soup like the old
Indic shaper I had to maintain in Pango.  That said, we have made design
decisions early on in the new Indic shaper that make me more confident trying
to shape a wider continuum of scripts using one code base.  We'll adjust the
design / decision as we integrate more scripts.

The nice thing this time is, we have an extensive test suite, so we can make
changes confidently.


behdad

> 
> On Thursday, 24 May 2012, Behdad Esfahbod <behdad at behdad.org
> <mailto:behdad at behdad.org>> wrote:
>> On 05/23/2012 06:48 PM, Andrew Cunningham wrote:
>>> I think what Ed is saying is that Tai Tham follows a similar model to Myanmar
>>> rather than a pure Indic model, where you have a distinct medials vs subjoined
>>> consonants wher subjoined consonants require a virama and medials don't
>>
>> I see.  Thanks for the clarification.
>>
>>> Par of a fundamental change between myanar in unicode 4.1 and 5.1
>>
>> Good to know.  I'll give HB a run on my Myanmar corpus and see if I can fix a
>> few high-impact issues.
>>
>>> Will look at my sources to confirm for Tai Tham.
>>
>> Thanks,
>> b
>>
>>> A.
>>>
>>> On Thursday, 24 May 2012, Behdad Esfahbod <behdad at behdad.org
> <mailto:behdad at behdad.org>
>>> <mailto:behdad at behdad.org <mailto:behdad at behdad.org>>> wrote:
>>>> Hi Thep,
>>>>
>>>> Humm, the message from Ed hat you are replying to never made it to me or to
>>>> the list.  Replies inline.
>>>>
>>>>
>>>> On 05/23/2012 06:53 AM, Theppitak Karoonboonyanan wrote:
>>>>> Hi, Ed, Behdad,
>>>>>
>>>>> On Sun, May 20, 2012 at 3:45 AM, Ed Trager <ed.trager at gmail.com
> <mailto:ed.trager at gmail.com>
>>> <mailto:ed.trager at gmail.com <mailto:ed.trager at gmail.com>>> wrote:
>>>>>> On Fri, May 18, 2012 at 5:48 PM, Behdad Esfahbod <behdad at behdad.org
> <mailto:behdad at behdad.org>
>>> <mailto:behdad at behdad.org <mailto:behdad at behdad.org>>> wrote:
>>>>>>> On 05/18/2012 04:02 PM, Ed Trager wrote:
>>>>>>>>
>>>>>>>> In Tai Tham, U+1A6E VOWEL SIGN E needs to be shifted all the way to
>>>>>>>> the left so that the final visual appearance would be:
>>>>>>>
>>>>>>> Are you sure?  Without U+1A60 TAI THAM SIGN SAKOT before the subjoined
>>>>>>> consonant?  Reading Unicode suggests that you need that sign betwee PA
>>> and LA.
>>>>>>
>>>>>> For most subjoined consonants, yes, that's true.  But note in
>>>>>> particular that U+1A56 MEDIAL LA and U+1A57 MEDIAL LA TANG LAI were
>>>>>> encoded separately.  In the case of these two "LA" signs, I believe
>>>>>> there are two reasons justifying the separate encoding:
>>>>>>
>>>>>> (1) These are variant forms of the same subjoined letter LA:
>>>>>> apparently, there is no other good way to do it other than encoding
>>>>>> both.
>>>>>>
>>>>>> (2) Both of these LA signs can be part of triple consonant clusters,
>>>>>> i.e. "KLW" appears in the common word Thai / Tai word for banana,
>>>>>> กล้วย, "klwy" .  In Tai Tham, both the L and the W appear as
>>>>>> below-base stacked forms (and actually the "y" is also a subjoined
>>>>>> form, but it's kind of hanging off the right side of the whole stack).
>>>>
>>>> I'm not questioning the separate encoding.  I don't care :-).  What I'm saying
>>>> is that you need a SAKOT before them for them to be considered part of the
>>>> same syllable according to the Indic OpenType spec and my implementation.
>>>> Now, if you think Unicode intended these to subjoin without a SAKOT, then I
>>>> like you to point me to documentation about that.
>>>>
>>>> If that is the case, we would need changes to the Indic machine.  Not
>>>> impossible, but I first want to make sure that it is indeed the case.
>>>>
>>>> behdad
>>>>
>>>>
>>>>
>>>>>> There are some other separately-encoded subjoining consonant signs:
>>>>>> U+1A5B, U+1A5C, U+1A5D, U+1A5E.
>>>>>
>>>>> Please also count U+1A55 (MEDIAL RA) in the rule, although it's not a
>>>>> subjoined form.
>>>>>
>>>>> Regards,
>>>>> -Thep.
>>>> _______________________________________________
>>>> HarfBuzz mailing list
>>>> HarfBuzz at lists.freedesktop.org <mailto:HarfBuzz at lists.freedesktop.org>
> <mailto:HarfBuzz at lists.freedesktop.org <mailto:HarfBuzz at lists.freedesktop.org>>
>>>> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>>>>
>>>
>>> --
>>> Andrew Cunningham
>>> Senior Project Manager, Research and Development
>>> Vicnet
>>> State Library of Victoria
>>> Australia
>>>
>>> andrewc at vicnet.net.au <mailto:andrewc at vicnet.net.au>
> <mailto:andrewc at vicnet.net.au <mailto:andrewc at vicnet.net.au>>
>>> lang.support at gmail.com <mailto:lang.support at gmail.com>
> <mailto:lang.support at gmail.com <mailto:lang.support at gmail.com>>
>>
> 
> -- 
> Andrew Cunningham
> Senior Project Manager, Research and Development
> Vicnet
> State Library of Victoria
> Australia
> 
> andrewc at vicnet.net.au <mailto:andrewc at vicnet.net.au>
> lang.support at gmail.com <mailto:lang.support at gmail.com>



More information about the HarfBuzz mailing list