[HarfBuzz] Shaping Tai Tham Analogue of Thai Sara Am

Fri Nov 11 00:47:38 UTC 2016

Printed Northern Thai in the Lanna script has a close analogue (or
rather two) of U+0E33 THAI CHARACTER SARA AM.  It was initially
proposed to encode what would have become TAI THAM SIGN AM and TAI THAM
SIGN TALL AM.  However, leading figures in the UTC insisted that these
compound vowel symbols be encoded as what became <U+1A63 TAI THAM VOWEL
SIGN AA, U+1A74 TAI THAM SIGN MAI KANG> and <U+1A64 TAI THAM VOWEL
SIGN TALL AA, U+1A74 TAI THAM SIGN MAI KANG>.

To shape these vowel symbols in the printed Northern Thai style (which
is also many people's handwritten style), I have to transfer the
non-spacing mark MAI KANG from the spacing mark SIGN AA or SIGN TALL AA
to the preceding consonant.  I had been doing this by aping the
processing for Thai, i.e.:

Prelude:
1. Ligate SIGN AA and MAI KANG to glyph 'mai_kam', which imitates THAI
CHARACTER SARA AM.

Imitation of Thai:
2. Decompose ('multiple substitution') 'mai_kam' to mark
'leftward_mai_kang' and the glyph 'mai_kaa' of SIGN AA.

3. Interchange leftward_mai_kang and any intermediately preceding tone
mark.  (The Thai rendering engine does this, but USE does not.)

4. Finalisation
Replace 'leftward_mai_kang' by the normal glyph for MAI KANG.

However, I have found a problem when words are reduplicated, e.g _mam
mam_, an invitation to a child to eat formed from _mam_ 'to eat'.  (I
haven't found the reduplicated word in a dictionary - it may be a
family word.)  The normal way to mark reduplication is to suffix the
combining mark above U+1A7B TAI THAM SIGN MAI SAM to the word.  The
encoding would therefore be <U+1A49 TAI THAM LETTER HIGH HA, U+1A60 TAI
THAM SIGN SAKOT, U+1A3E TAI THAM LETTER MA, U+1A75 TAI THAM SIGN TONE-1,
U+1A63 TAI THAM VOWEL SIGN AA, U+1A74 TAI THAM SIGN MAI KANG, U+1A7B
TAI THAM SIGN MAI SAM> (ᩉ᩠ᨾ᩵ᩣᩴ᩻). Now, USE inserts U+25CC before SIGN
AA and MS USE therefore also before all the subsequent marks.  However,
it seems that the effects of cleaning them out are irrelevant to my
problem, for this also occurs when I use a Latin to Lanna
transliteration mode, where these inserts do not occur.  It also shows
up in the Lanna script in LibreOffice 5.0.6 on Ubuntu, which appears to
be using the old SEA rendering engine.

My problem is that the MAI SAM glyph is not treated as being on the
mai_kaa glyph regenerated by splitting mai_kam.  When mai_kam is split
into the mark leftward_mai_kang and the base mai_kaa (in that order),
the MAI SAM glyph is associated with the leftward_mai_kang rather than
the residual base mai_kaa. This logic was introduced in 2012 to match
Microsoft's treatment of marks on an Arabic consonant ligature split by
a multiple substitution. The practical consequence is that MAI SAM glyph
is attached directly to the glyph for HIGH HA.  In my font it is then
obscured by the mai_kang glyph which is also attached directly, and
seems to disappear!

The problem does not occur with USE as in MS Edge - the word renders as
expected.

(I can simplify the problem to the totally unattested word <U+1A2F TAI
THAM LETTER DA, U+1A63, U+1A74, U+1A7B> /dam dam/.)

Now, should I have expected my aping of the Thai and Lao rendering to
work or should I only have trusted base-mark association logic needed
for Arabic?

Richard.