[HarfBuzz] Clustering and Hit Detection

Richard Wordingham richard.wordingham at ntlworld.com
Sat Apr 6 08:53:19 PDT 2013


Dear List,

I understood that one of the reasons for using a shaping engine to
sequence glyphs rather than a sequence of substitutions was so that
selection of glyphs at the visual level could select the appropriate
characters in backing store.  However, it seems that default extended
grapheme clusters, and their extensions by a subjoined consonant, are
reported as an indivisible cluster.  This seems to make it very
difficult to work back from glyph to character.  Is there, therefore,
any reason not to effectively implement lower level reorderings as
substitutions of the form a b -> b a?

I do see a related problem.  Thai has a justification mode ('Thai
justification') in which spaces are increased between letters.  Preposed
and postposed vowels count as letters.  An issues appears to arise with
words like น้ำ <U+0E19 THAI CHARACTER NO NU, U+0E49 THAI CHARACTER MAI
THO, U+0E33 THAI CHARACTER SARA AM>.  LibreOffice 4.0.2.1 justifies
this as though it were composed of two clusters, <U+0E19, U+0E4D THAI
CHARACTER NIKHAHIT, U+0E49> and <U+0E32 THAI CHARACTER SARA AA>.
However, HarfBuzz declares the word to be one cluster.  How is a
renderer using HarfBuzz expected to perform Thai justification on such
a word?

There *may* be an even worse issue with Tai Tham.  If that is to use
Thai justification, preposed vowels (general category Mc) and
following vowels (also Mc) will need to have gaps inserted between them
and the consonant, but HarfBuzz gives no clue as to where the gap
occurs.  I don't know whether Thai justification should occur with Tai
Tham; pre-Unicode fonts that I have seen generally use ASCII character
codes for some of the glyphs, and that may inhibit Thai justification.

Richard.



More information about the HarfBuzz mailing list