[HarfBuzz] Dotted Circles in Tai Tham
Richard Wordingham
richard.wordingham at ntlworld.com
Thu Feb 26 11:04:17 PST 2015
On Thu, 26 Feb 2015 09:09:31 -0800
Behdad Esfahbod <behdad at behdad.org> wrote:
> Hi Richard,
>
> I was away for a few weeks. I'm glad you and Roozbeh got into
> discussion. Working with him and Andrew is indeed the best way
> forward. Note that as you observed, SEA is very liberal in what it
> accepts. That's simply because we didn't know any better.
Actually, Martin Hosken once presented a very similar production to the
UTC, but without the algebraic simplification. No-one remarked that
there wasn't much that it disallowed.
The very first word in the MFL dictionary is <HIGH KA, SIGN U, TONE-2,
SIGN AA, SAKOT, NA, SAKOT, NGA> /kaankuĊ/ 'to prosper', with two(!)
final consonants attached below the visually final vowel. It renders
fine on LibreOffice at the moment - thanks to HarfBuzz. I wrestled and
failed with the problem of encoding this word phonetically.
> What would immensely help is to gather sequences that you (and
> others) think should be considered one syllable. We can then add
> these to Roozbeh's indic repository as test data (with the USE
> grammar). That will be extremely valuable, and I'm willing to set up
> the code to run the tests.
I take it you're looking for a regular expression. Would this be a
regular expression for strings of symbols, rather than traces? (Traces
are defined from strings by allowing certain pairs of 'letters' to
commute
- fully decomposed character strings under canonical equivalence are the
example that interests us. The theory gets messy with Kleene star.) I
notice USE seems, from the Buginese and some (all?) of the Tibetan
overrides, to be working by matching NFD strings against the patterns.
May I assume a suitable permutation of the non-zero canonical combining
classes?
Alternatively, are you just looking for a probing test set of real
words?
I can tackle the Tai Tham script. Other scripts are likely to get a
sketchy treatment from me, probably based on what I can glean from the
encoding proposals.
Richard.
More information about the HarfBuzz
mailing list