[HarfBuzz] Ligatures

Khaled Hosny dr.khaled.hosny at gmail.com
Sun May 24 16:00:45 UTC 2020



> On May 24, 2020, at 5:41 PM, Eli Zaretskii <eliz at gnu.org> wrote:
> 
>>> I almost understand (and agree), sans one part: the "arbitrary parts"
>>> of what you wrote.  If we want to produce a ligature out of "ffi", the
>>> shaper will get "fii" and nothing more.  Which part here is arbitrary?
>> 
>> Sending "ffi" alone is an arbitrary decision. The font might have kerning between "ffi" and what comes before and after it, but you won't get it. The font might not have a ligature for "ffi" at all, but using kerning instead, so you will get kerning between "ffi" glyphs and not other glyphs which is arbitrary. It might be a cursive font that changes glyph shapes based on surrounding glyphs, and you will get that for "ffi" and not elsewhere which is arbitrary.
>> 
>> That is just plain wrong, there is no way around it.
> 
> So, to make sure I understand the correct solution: you are saying
> that all the text to be displayed should go through the shaper, is
> that right?
> 
> If so, how large should be the chunks of text to be passed to the
> shaper in any one call, in order to have a correct result?  Would it
> be enough to pass whitespace-separated words one by one? or do we need
> to send entire physical lines (up to the terminating newline
> character)? or maybe an entire paragraph?  What is the recommendation
> here?

In general the safest is to pass the whole paragraph of text and the start and length of each item (item being a run with same font, direction, script, and language).

This, for example, ensures that HarfBuzz can do basic Arabic-like shaping across item boundaries e.g. if you break items in the middle of an Arabic word (due to font change, for example), you still get the initial/medial/final forms across the boundary as appropriate. Or to put a combining mark at the start of a paragraph on a dotted circle as it otherwise has no base.

If this is not possible, then you can try to pass enough context, like reach back and forward to first character that is not a combining mark. This may or may not be enough.

Shaping space-delimited words is orthogonal to that, context is better be always provided.

Some fonts do have OpenType lookups that interact with space (e.g. kerning pairs involving space, or even substitutions involving space), so shaping words independently will give suboptimal result. You can use HarfBuzz API to find out if the font has OpenType layout rules involving space, or decide to live with this limitation. Firefox does this check as it wants to cache individualizing ideal shaped words when possible, and Chrome used to do that to but I think they now make sure to retain enough information to avoid unnecessary reshaping so such a word cache is not needed.

Regards,
Khaled


More information about the HarfBuzz mailing list