[HarfBuzz] Zero-width joiner has width
Jonathan Kew
jfkthame at gmail.com
Sun Aug 2 10:08:39 PDT 2015
On 2/8/15 17:45, Simon Cozens wrote:
> Here's an interesting one I came across when implementing Uyghur
> hyphenation. The trick in hyphenated Uyghur is to use a ZWJ to ensure
> that the last character of hyphenated Arabic morphemes remains in medial
> form. However, when I send a Arabic + ZWJ + hyphen sequence to Harfbuzz,
> it inserts a space between the hyphen and the Arabic:
>
>> zwj = SU.utf8char(0x200d)
>> text = "تئەۋ" .. zwj .. "-"
>> SILE.shaper:shapeToken(text, SILE.font.loadDefaults({ font = "Amiri",
> direction = "RTL" }))
> {
> {
> codepoint = 16,
> depth = -1.943359375,
> height = 2.666015625,
> name = "hyphen",
> width = 3.681640625,
> },
> {
> codepoint = 3,
> depth = 0,
> height = 0,
> name = "space",
> width = 2.9296875,
> },
> {
> codepoint = 552,
> depth = 2.24609375,
> height = 6.2841796875,
> name = "uni06CB",
> width = 4.0087890625,
> },
> {
> codepoint = 2226,
> depth = 0.048828125,
> height = 4.580078125,
> name = "uni06D5.fina",
> width = 3.7939453125,
> },
> {
> codepoint = 3024,
> depth = 0.0048828125,
> height = 5.078125,
> name = "uni0626.medi_BaaBaaInit",
> width = 1.6845703125,
> },
> {
> codepoint = 3732,
> depth = 0.0634765625,
> height = 4.8779296875,
> name = "uni062A.init_BaaBaaIsol",
> width = 3.193359375,
> },
> }
>
> Making the case even more simple:
>
>> SILE.shaper:shapeToken(zwj, SILE.font.loadDefaults({ font = "Amiri",
> direction = "RTL" }))
> {
> {
> codepoint = 3,
> depth = 0,
> height = 0,
> name = "space",
> width = 2.9296875,
> },
> }
>
> I would have hoped that a zero-width joiner had... zero width.
It's expected that you'll see a <space> glyph here, because harfbuzz
uses that as a replacement for default-ignorables; however, it also sets
the advance width to zero, so I'm not sure why you're seeing a non-zero
advance.
Testing locally with hb-shape, I get a zero-width <space> (as expected):
$ hb-unicode-encode 062A 0626 06D5 06CB 200d 2d | hb-shape
amiri-regular.ttf
[hyphen=5+754|space=4+0|uni06CB=3+821|uni06D5.fina=2+777|uni0626.medi_BaaBaaInit=1+345|uni062A.init_BaaBaaIsol=0+654]
$ hb-unicode-encode 200b | hb-shape amiri-regular.ttf
[space=0+0]
Which suggests there's something odd about how you're using harfbuzz.
JK
More information about the HarfBuzz
mailing list