[HarfBuzz] Dotted Circles in Tai Tham

Richard Wordingham richard.wordingham at ntlworld.com
Tue Feb 24 12:03:43 PST 2015


On Tue, 24 Feb 2015 09:26:41 -0800
Roozbeh Pournader <roozbeh at google.com> wrote:

> On Tue, Feb 24, 2015 at 5:03 AM, Richard Wordingham <
> richard.wordingham at ntlworld.com> wrote:
> 
> > Are we still left with IndicSyllabicCategory.txt as the only
> > functional definition of the properties?
 
> Not necessarily. USE seems to use a combination of Indic syllabic,
> Indic positional, and general categories, with some codepoints as
> exceptions. HarfBuzz has been using some very similar techniques too,
> with tables automatically derived from the Unicode data files and
> then some exceptions in code.

That's what I'd call a *formal* definition.  The definition of
well-formed clusters by USE provides what I would regard as a
*functional* definition.  One can then classify a character by where it
occurs.  Of course, USE need not have captured all combinations, and
indeed I say it has not.

> > 1. Is <consonant><dependent_vowel>_<dependent_vowel> an allowed
> > context for a 'Consonant_Medial' if it is allowed for an invisible
> > stacker plus consonant?

<snip>

> > 3. Are they allowed contexts for 'Consonant_Subjoined' if they are
> > allowed for an invisible stacker plus consonant?

> They could be, as soon as we have evidence that there is need for
> allowing them (if we don't allow them at the moment). Generally, give
> us the character sequence that should work and doesn't, and why your
> sequence is correct according to Unicode encoding of a script, and
> HarfBuzz will get the patterns fixed to allow the character sequence.

That's circular!  The USE makes very little distinction between a
consonant_medial and consonant_subjoined.  One distinction is that a
consonant_medial cannot be followed by <invisible_stacker, consonant>.
The MFL Revisison 1 p801 (I need this reference for the UTC) has eight
words starting with the cluster <HIGH HA, MEDIAL LA, SAKOT, WA> /lw/,
so U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA should be
'Consonant_subjoined'.  I've also seen it after a vowel in another
dictionary.  There is a word in which U+1A55 TAI THAM CONSONANT SIGN
MEDIAL RA phonetically follows a written vowel, so that eliminates the
medial consonants as a Tai Tham category!

Richard.


More information about the HarfBuzz mailing list