[HarfBuzz] Control characters inside ligatures

Behdad Esfahbod behdad.esfahbod at gmail.com
Mon Dec 7 01:59:24 PST 2015


On 15-12-07 10:53 AM, Khaled Hosny wrote:
> On Mon, Dec 07, 2015 at 09:14:19AM +0100, Behdad Esfahbod wrote:
>> On 15-12-05 03:31 PM, Khaled Hosny wrote:
>>> Hi,
>>>
>>> I just noticed that when there is a control character between character
>>> that form a ligature, there is a zero width space after the ligature
>>> with a cluster value of the first character in the ligature, for
>>> example:
>>>
>>> $ hb-unicode-encode U+0066,U+200C,U+0069 | hb-shape amiri-regular.ttf
>>> [f_i=0+1064|space=0+0]
>>>
>>> or 
>>>
>>> $ hb-unicode-encode U+0066,U+00AD,U+0069 | hb-shape amiri-regular.ttf 
>>> [f_i=0+1064|space=0+0]
>>>
>>> This is rather surprising as I was expecting the control character to be
>>> consumed inside the ligature and only the ligature glyph would remain. I
>>> think the current behaviour makes mapping glyphs to text indices harder
>>> in this case. WDYT?
>>
>> I don't think it makes any difference.  It's a zero-width glyph, so it
>> contributes nothing to the cluster as a whole, so you still have to divide the
>> sum of the widths of the glyphs by the number of cursor stops and that works
>> the same both ways.  No?
> 
> I was thinking in terms of line breaks, since the soft hyphen is a break
> opportunity I need to know that the sequence <f><soft hyphen><i> became
> the <fi> glyph, but I’m not sure how to do that with the extra glyph
> with the same cluster value.


> But may be I’m looking to it from the wrong
> angle, ad I simply need to reshape the left side (probably with a real
> hyphen) and the right side and just break the line there.

Correct.  There's no easy way around reshaping.  Firefox has a bug open for this.

b


More information about the HarfBuzz mailing list