[HarfBuzz] Dotted Circles in Tai Tham

Richard Wordingham richard.wordingham at ntlworld.com
Thu Feb 26 11:04:17 PST 2015


On Thu, 26 Feb 2015 09:09:31 -0800
Behdad Esfahbod <behdad at behdad.org> wrote:

> Hi Richard,
> 
> I was away for a few weeks.  I'm glad you and Roozbeh got into
> discussion. Working with him and Andrew is indeed the best way
> forward.  Note that as you observed, SEA is very liberal in what it
> accepts.  That's simply because we didn't know any better.

Actually, Martin Hosken once presented a very similar production to the
UTC, but without the algebraic simplification.  No-one remarked that
there wasn't much that it disallowed.

The very first word in the MFL dictionary is <HIGH KA, SIGN U, TONE-2,
SIGN AA, SAKOT, NA, SAKOT, NGA> /kaankuĊ‹/ 'to prosper', with two(!)
final consonants attached below the visually final vowel.  It renders
fine on LibreOffice at the moment - thanks to HarfBuzz.  I wrestled and
failed with the problem of encoding this word phonetically.

> What would immensely help is to gather sequences that you (and
> others) think should be considered one syllable.  We can then add
> these to Roozbeh's indic repository as test data (with the USE
> grammar).  That will be extremely valuable, and I'm willing to set up
> the code to run the tests.

I take it you're looking for a regular expression.  Would this be a
regular expression for strings of symbols, rather than traces?  (Traces
are defined from strings by allowing certain pairs of 'letters' to
commute
- fully decomposed character strings under canonical equivalence are the
example that interests us.  The theory gets messy with Kleene star.)  I
notice USE seems, from the Buginese and some (all?) of the Tibetan
overrides, to be working by matching NFD strings against the patterns.
May I assume a suitable permutation of the non-zero canonical combining
classes?

Alternatively, are you just looking for a probing test set of real
words?

I can tackle the Tai Tham script.  Other scripts are likely to get a
sketchy treatment from me, probably based on what I can glean from the
encoding proposals.

Richard.


More information about the HarfBuzz mailing list