[HarfBuzz] Thai shaping and dotted circle

Richard Wordingham richard.wordingham at ntlworld.com
Mon Sep 30 15:49:14 PDT 2013


On Mon, 30 Sep 2013 16:42:35 -0400
Behdad Esfahbod <behdad at behdad.org> wrote:

> On 13-09-30 04:22 PM, Richard Wordingham wrote:
> > What makes eminent sense, but is probably unduly hard, is to treat
> > allegedly 'invalid combinations' as invalid if mark-to-mark
> > positioning is not employed.
> 
> Actually it doesn't.  What's 'invalid' is a text encoding issue.
> That's independent of the font used.  The font may be broken, but
> that doesn't make the text invalid.

I was looking at it from the point of view of usefulness.

As a text encoding issue, invalidity is a matter of sorting out what
cannot be interpreted, such as a run of variation selectors, or cannot
be handled by canonical equivalence, such as non-interacting sequences
of combining class 0 marks.    

For example, I don't see why <SARA II, SARA II> is not a valid encoding
of the two marks occurring together, and if a font used mark-to-mark
positioning there would be no problem with it as the encoding of a
bizarre grouping of characters.  The dotted circle serves no purpose;
anyone who can read Thai and see properly would recognise the repeated
vowel for what it is - a bizarre combination.  (Actually, there is
another issue - the second SARA II may be removed because it is too far
above the baseline.  I am currently getting some grief from LibreOffice
because in some contexts the first SARA II is being removed as too far
from the baseline!) 

However, if the SARA II is positioned by nothing more subtle than a
contextual substitution (to avoid clashes with ascenders), and no
dotted circle is present, then it will be taken as a single SARA II.
This is when a dotted circle is useful - spoofing is defeated, and
a typing error is made visible.  There are a great many Thai typing
errors that would be avoided by a dotted circle popping up as typing
progresses.

I am wondering whether this particular combination is being entered
deliberately.  I got over a million raw Google hits on "อีีก"; "อีก"
means 'additional, extra', so it could be an expressive writing of the
word rather than an unnoticed typing error.

Richard.



More information about the HarfBuzz mailing list