[HarfBuzz] Zero-width joiner has width

Simon Cozens simon at simon-cozens.org
Sun Aug 2 09:45:09 PDT 2015


Here's an interesting one I came across when implementing Uyghur
hyphenation. The trick in hyphenated Uyghur is to use a ZWJ to ensure
that the last character of hyphenated Arabic morphemes remains in medial
form. However, when I send a Arabic + ZWJ + hyphen sequence to Harfbuzz,
it inserts a space between the hyphen and the Arabic:

> zwj = SU.utf8char(0x200d)
> text = "تئەۋ" .. zwj .. "-"
> SILE.shaper:shapeToken(text, SILE.font.loadDefaults({ font = "Amiri",
direction = "RTL"  }))
{
  {
    codepoint = 16,
    depth = -1.943359375,
    height = 2.666015625,
    name = "hyphen",
    width = 3.681640625,
  },
  {
    codepoint = 3,
    depth = 0,
    height = 0,
    name = "space",
    width = 2.9296875,
  },
  {
    codepoint = 552,
    depth = 2.24609375,
    height = 6.2841796875,
    name = "uni06CB",
    width = 4.0087890625,
  },
 {
    codepoint = 2226,
    depth = 0.048828125,
    height = 4.580078125,
    name = "uni06D5.fina",
    width = 3.7939453125,
  },
  {
    codepoint = 3024,
    depth = 0.0048828125,
    height = 5.078125,
    name = "uni0626.medi_BaaBaaInit",
    width = 1.6845703125,
  },
  {
    codepoint = 3732,
    depth = 0.0634765625,
    height = 4.8779296875,
    name = "uni062A.init_BaaBaaIsol",
    width = 3.193359375,
  },
}

Making the case even more simple:

> SILE.shaper:shapeToken(zwj, SILE.font.loadDefaults({ font = "Amiri",
direction = "RTL"  }))
{
  {
    codepoint = 3,
    depth = 0,
    height = 0,
    name = "space",
    width = 2.9296875,
  },
}

I would have hoped that a zero-width joiner had... zero width.


More information about the HarfBuzz mailing list