[HarfBuzz] Zero-width joiner has width

Jonathan Kew jfkthame at gmail.com
Sun Aug 2 10:08:39 PDT 2015


On 2/8/15 17:45, Simon Cozens wrote:
> Here's an interesting one I came across when implementing Uyghur
> hyphenation. The trick in hyphenated Uyghur is to use a ZWJ to ensure
> that the last character of hyphenated Arabic morphemes remains in medial
> form. However, when I send a Arabic + ZWJ + hyphen sequence to Harfbuzz,
> it inserts a space between the hyphen and the Arabic:
>
>> zwj = SU.utf8char(0x200d)
>> text = "تئەۋ" .. zwj .. "-"
>> SILE.shaper:shapeToken(text, SILE.font.loadDefaults({ font = "Amiri",
> direction = "RTL"  }))
> {
>    {
>      codepoint = 16,
>      depth = -1.943359375,
>      height = 2.666015625,
>      name = "hyphen",
>      width = 3.681640625,
>    },
>    {
>      codepoint = 3,
>      depth = 0,
>      height = 0,
>      name = "space",
>      width = 2.9296875,
>    },
>    {
>      codepoint = 552,
>      depth = 2.24609375,
>      height = 6.2841796875,
>      name = "uni06CB",
>      width = 4.0087890625,
>    },
>   {
>      codepoint = 2226,
>      depth = 0.048828125,
>      height = 4.580078125,
>      name = "uni06D5.fina",
>      width = 3.7939453125,
>    },
>    {
>      codepoint = 3024,
>      depth = 0.0048828125,
>      height = 5.078125,
>      name = "uni0626.medi_BaaBaaInit",
>      width = 1.6845703125,
>    },
>    {
>      codepoint = 3732,
>      depth = 0.0634765625,
>      height = 4.8779296875,
>      name = "uni062A.init_BaaBaaIsol",
>      width = 3.193359375,
>    },
> }
>
> Making the case even more simple:
>
>> SILE.shaper:shapeToken(zwj, SILE.font.loadDefaults({ font = "Amiri",
> direction = "RTL"  }))
> {
>    {
>      codepoint = 3,
>      depth = 0,
>      height = 0,
>      name = "space",
>      width = 2.9296875,
>    },
> }
>
> I would have hoped that a zero-width joiner had... zero width.

It's expected that you'll see a <space> glyph here, because harfbuzz 
uses that as a replacement for default-ignorables; however, it also sets 
the advance width to zero, so I'm not sure why you're seeing a non-zero 
advance.

Testing locally with hb-shape, I get a zero-width <space> (as expected):

$ hb-unicode-encode 062A 0626 06D5 06CB 200d 2d | hb-shape 
amiri-regular.ttf
[hyphen=5+754|space=4+0|uni06CB=3+821|uni06D5.fina=2+777|uni0626.medi_BaaBaaInit=1+345|uni062A.init_BaaBaaIsol=0+654]

$ hb-unicode-encode 200b | hb-shape amiri-regular.ttf
[space=0+0]

Which suggests there's something odd about how you're using harfbuzz.

JK



More information about the HarfBuzz mailing list