[HarfBuzz] Dotted Circles in Tai Tham
Richard Wordingham
richard.wordingham at ntlworld.com
Tue Feb 24 12:03:43 PST 2015
On Tue, 24 Feb 2015 09:26:41 -0800
Roozbeh Pournader <roozbeh at google.com> wrote:
> On Tue, Feb 24, 2015 at 5:03 AM, Richard Wordingham <
> richard.wordingham at ntlworld.com> wrote:
>
> > Are we still left with IndicSyllabicCategory.txt as the only
> > functional definition of the properties?
> Not necessarily. USE seems to use a combination of Indic syllabic,
> Indic positional, and general categories, with some codepoints as
> exceptions. HarfBuzz has been using some very similar techniques too,
> with tables automatically derived from the Unicode data files and
> then some exceptions in code.
That's what I'd call a *formal* definition. The definition of
well-formed clusters by USE provides what I would regard as a
*functional* definition. One can then classify a character by where it
occurs. Of course, USE need not have captured all combinations, and
indeed I say it has not.
> > 1. Is <consonant><dependent_vowel>_<dependent_vowel> an allowed
> > context for a 'Consonant_Medial' if it is allowed for an invisible
> > stacker plus consonant?
<snip>
> > 3. Are they allowed contexts for 'Consonant_Subjoined' if they are
> > allowed for an invisible stacker plus consonant?
> They could be, as soon as we have evidence that there is need for
> allowing them (if we don't allow them at the moment). Generally, give
> us the character sequence that should work and doesn't, and why your
> sequence is correct according to Unicode encoding of a script, and
> HarfBuzz will get the patterns fixed to allow the character sequence.
That's circular! The USE makes very little distinction between a
consonant_medial and consonant_subjoined. One distinction is that a
consonant_medial cannot be followed by <invisible_stacker, consonant>.
The MFL Revisison 1 p801 (I need this reference for the UTC) has eight
words starting with the cluster <HIGH HA, MEDIAL LA, SAKOT, WA> /lw/,
so U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA should be
'Consonant_subjoined'. I've also seen it after a vowel in another
dictionary. There is a word in which U+1A55 TAI THAM CONSONANT SIGN
MEDIAL RA phonetically follows a written vowel, so that eliminates the
medial consonants as a Tai Tham category!
Richard.
More information about the HarfBuzz
mailing list