[HarfBuzz] Thai below-base normalization

Mon Feb 3 05:11:49 CET 2014

Dear Richard,

> Note that U+0E3A does occur following upper vowels (U+0E34-7).

Indeed.

> Does this denote a rendering interaction, or is <U+0E3A, U+0E34> just
> the obvious way of entering what someone (who?) says should be <U+0E34,
> U+0E3A>?  <U+0E34, U+0E3A> breaches the rule of marks below and then
> marks above.

Firstly, there is no such rule. It's just a convention.

Secondly, the U+0E34 has CCC=0 and therefore the U+0E3A can occur before or after and it and <sarcasm>all wonderful</sarcasm> normalization algorithm will sort it out. Given that in this context the pintu is modifying the vowel, it seems natural to store it after the vowel (hence U+0E34 U+0E3A). But there can be other language contexts in which the pintu is modifying the consonant and so the opposite order would be used. Perhaps this is a case where normalization might have helped and a less than optimal linguistic order would have resulted in clearer data storage. Either way, implementations should handle both orders (which given the current CCC values, is what will happen).

Yours,
Martin