[HarfBuzz] Ligatures

Eli Zaretskii eliz at gnu.org
Fri May 22 19:32:04 UTC 2020


Hi,

This is a bit off-topic, but I thought it could be appropriate to ask
here, since we have here some of the best experts on this subject.

We are discussing support for ligatures in Emacs, specifically when
using HarfBuzz as the shaping engine.  See the discussion from

  https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html

The current support for producing ligatures works in the same way as
complex text shaping for scripts that require that, like Arabic and
Khmer: the sequences of characters that can be displayed as ligatures
are identified in advance with suitable regular expressions, and the
display engine then passes these sequences to hb_shape to produce the
ligatures.

This works well for scripts that require complex shaping, because such
scripts generally have well-defined rules for the sequences of
codepoints that need shaping.  My original thoughts were that
ligatures could be supported in the same way, based on the assumption
that the list of possible ligatures is finite and can be stored in a
suitable data stricture in advance.

However, I'm being told that this assumption is false, and that each
font defines ligatures from any number of arbitrary combinations of
characters, and therefore the exhaustive list of the ligatures is in
practice infinite and cannot be provided in advance.  The only way of
doing this right, I'm told, is to either (a) query the font to get the
list of all the ligatures it supports, or (b) assume any combination
of characters can produce a ligature, and therefore we need to pass
all the characters intended for display through hb_shape.  The latter
in particular is in stark contrast to how the current Emacs display
code is designed and implemented.

To be specific, I'm talking about 2 kinds of ligatures:

  . ligatures made of Latin characters, like "ffi" and "Th"
  . ligatures produced from symbols, like "==>" that is
    converted into ⟹

Can someone please tell what are the recommended practices regarding
these ligatures?  Is the set of possible ligatures indeed infinite and
impossible to know in advance?  And does HarfBuzz have APIs to query a
font about the ligatures it supports?

Thanks in advance for any help.


More information about the HarfBuzz mailing list