[HarfBuzz] Thai shaping and dotted circle

Richard Wordingham richard.wordingham at ntlworld.com
Mon Sep 30 13:22:20 PDT 2013


On Mon, 30 Sep 2013 19:36:19 +0100
Jonathan Kew <jfkthame at googlemail.com> wrote:

> On 30/9/13 19:08, Behdad Esfahbod wrote:
> > On 13-09-30 09:05 AM, Toresson, Alexander (EXT) wrote:
> >> Hello all,
> >>
> >>
> >> For for example Bengali, a dotted circle (U+25CC) is inserted
> >> before standalone combining marks. The same is not done for Thai,
> >> except for the first character in a paragraph/text (--bot for
> >> hb-shape/hb-view). Why? According to
> >> http://www.microsoft.com/typography/otfntdev/thaiot/other.htm,
> >> “invalid combinations” should cause a dotted circle to be inserted.
> >
> > That's something we want to fix, but we have not got to yet.
> >
> 
> ....although it raises the difficult and potentially controversial 
> question of what exactly is an "invalid combination".

And for the Thai script, there are a few general purpose diacritics
which are used by dictionary and minority writing systems, such as
U+0331 COMBINING MACRON BELOW and U+0359 COMBINING ASTERISK BELOW.

The Microsoft restrictions are pretty liberal, especially compared to
the Lao ones, which prohibit Pali.  However, the prohibition on two
tone marks is iffy, as in Tai Lue in the Lao script combinations of tone
marks do occur.  (Uniscribe, as described, also prohibits Tai Lue in the
Lao script.)

What makes eminent sense, but is probably unduly hard, is to treat
allegedly 'invalid combinations' as invalid if mark-to-mark positioning
is not employed.  Superimposed marks are unreadable and probably wrong.
Indeed, some of the abominable Latin script combinations lurking in the
Common Locale Data Repository (CLDR) would benefit from the automatic
insertion of dotted circles.

> >> Speaking of invalid combinations, it seems like HarfBuzz allows
> >> for example U+0E48 to be combined with for example latin U+0041,
> >> which seems rather permissive.

Thai and Lao combining marks are frequently displayed on hyphen- or
'x'-shaped characters.  The preferred choices seem to be U+2013 EN DASH
for Thai and U+00D7 MULTIPLICATION SIGN for Lao, though I have
encountered the counter-claim that the latter is actually a sanserif
'x' when used as a base character.

Richard.



More information about the HarfBuzz mailing list