[HarfBuzz] Control characters inside ligatures

Khaled Hosny khaledhosny at eglug.org
Mon Dec 7 01:53:36 PST 2015


On Mon, Dec 07, 2015 at 09:14:19AM +0100, Behdad Esfahbod wrote:
> On 15-12-05 03:31 PM, Khaled Hosny wrote:
> > Hi,
> > 
> > I just noticed that when there is a control character between character
> > that form a ligature, there is a zero width space after the ligature
> > with a cluster value of the first character in the ligature, for
> > example:
> > 
> > $ hb-unicode-encode U+0066,U+200C,U+0069 | hb-shape amiri-regular.ttf
> > [f_i=0+1064|space=0+0]
> > 
> > or 
> > 
> > $ hb-unicode-encode U+0066,U+00AD,U+0069 | hb-shape amiri-regular.ttf 
> > [f_i=0+1064|space=0+0]
> > 
> > This is rather surprising as I was expecting the control character to be
> > consumed inside the ligature and only the ligature glyph would remain. I
> > think the current behaviour makes mapping glyphs to text indices harder
> > in this case. WDYT?
> 
> I don't think it makes any difference.  It's a zero-width glyph, so it
> contributes nothing to the cluster as a whole, so you still have to divide the
> sum of the widths of the glyphs by the number of cursor stops and that works
> the same both ways.  No?

I was thinking in terms of line breaks, since the soft hyphen is a break
opportunity I need to know that the sequence <f><soft hyphen><i> became
the <fi> glyph, but I’m not sure how to do that with the extra glyph
with the same cluster value. But may be I’m looking to it from the wrong
angle, ad I simply need to reshape the left side (probably with a real
hyphen) and the right side and just break the line there.

Regards,
Khaled


More information about the HarfBuzz mailing list