[HarfBuzz] Kerning in Telugu (and other similar languages)
mathog
mathog at caltech.edu
Wed Aug 13 17:02:18 PDT 2014
Greetings,
Behdad Esfahbod said that this would be a good place to ask, even though
the underlying software is Pango and not Harfbuzz. I am trying to
resolve a bug in Inkscape
https://bugs.launchpad.net/inkscape/+bug/1282968
that involves this string in Telugu:
U+0C07,U+0C02,U+0C15,U+0C4D,U+200C,U+0C38,U+0C4D,U+0C15,U+0C47,U+0C2A,U+0C4D
As will soon become apparent, I know next to nothing about Indic
languages. Counting "stacked glyphs" as one glyph that is supposed to
render as 6 glyphs
https://bugs.launchpad.net/inkscape/+bug/1282968/+attachment/3989437/+files/correct-rendering.png
Pango breaks these 6 up into 3 logical clusters as 2:3:1.
First question: are the colon positions the proper places to insert
kerning spaces?
pango_shape() descriptions of most languages have the property that each
logical cluster begins with a character with the "is_cursor_position"
attribute set, and it is not set elsewhere in the cluster. (In the
European languages each logical cluster is usually one letter with
possibly one or more accents or other similar modifiers.) That is
almost the case here too, except there is a second character within the
2nd logical cluster that also has that bit set (character 8, 0C15).
Behdad referred me to this document:
http://www.w3cindia.in/Indic-req-draft/Indic-layout-requirements.html#letter-spac
which says that aksharas are supposed to move around as a block. Cursor
positions are generally where kerning spaces are inserted. I don't know
how to reconcile this situation, so...
Second question: Is the 2nd logical cluster returned by Pango something
larger than an akshara?
Third question: in a text editor for this sort of language, are the
cursor positions restricted to the akshara transitions, or does the
cursor move around within an akshara stopping at each "stack glyph"?
What happens now in Inkscape is that one can delete unicode characters
off the tail of a logical cluster, but there is no way to move around
within it. This text can only be entered with control codes
(^U0C07<return>^U0C02<return> etc).
Fourth and last question: the string includes 200C, which is a "zero
width line nonjoiner". There is no corresponding glyph - all it does is
prevent two adjacent characters from merging into a different kind of
glyph. In terms of cursor motion, if one were moving across the text
left to right with "right arrow" presses, would one expect the cursor to
stay in the corresponding spot for two such presses, as if moving across
a zero width character, or would its presence not affect cursor motion?
Thank you,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the HarfBuzz
mailing list