[HarfBuzz] Ligatures
Richard Wordingham
richard.wordingham at ntlworld.com
Sat May 23 20:42:24 UTC 2020
On Sat, 23 May 2020 19:45:17 +0300
Eli Zaretskii <eliz at gnu.org> wrote:
> > Date: Sat, 23 May 2020 16:54:51 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> > Cc: harfbuzz at lists.freedesktop.org
> >
> > > Emacs supports more than one rule for each composable sequence of
> > > characters.
> >
> > That doesn't help when the rules give conflicting divisions into
> > clusters, which is the case with Tai Tham.
>
> The assumption is that either the rules can be arranged in an order
> that allows to use the first matching rule, or, failing that, that you
> write your own composing function that implements whatever logic
> that's required to select the right rule.
That choice needs tied to the choice of font - or for Tai Tham you use
my hack technique. However, it's not as bad as it could be. There's
something strange going on in Tai Tham even at Emacs 27.05. I can have
two aksharas interacting for shaping, but it take two 'ordinary' key
advances to pass through it, apparently implying that there are two
clusters. Clusters for cursor advancement and clusters for shaping seem
to be controlled independently!
From the dotted circle insertion logic, Emacs 27.05 on my machine
definitely looks as though it's using some form of HarfBuzz.
> > The Devanagari rule only covers the Vedic marks in the Devanagari
> > block, the 'stress signs' according to the comments. Can rules
> > essentially for different scripts now share combining marks? The
> > newer Vedic marks were supposed to be available to at least all
> > Indian Indic scripts.
>
> I don't know enough about this to make sure I even understand the
> question, let alone can provide an answer. One thing I can say is
> that the regexp pattern in a rule can specify different context (the
> surrounding characters) even if the character that triggers the rule
> is the same. Failing that, I guess the solution will again be the
> function that produces the composition.
>
> As for different scripts: if the character codepoints are the same,
> Emacs currently assigns each character to a single script.
I'll need to dig deeper. Composition of both 'a' and Greek alpha with
an acute accent works, which suggest that the problem isn't there for
characters with a script property of 'inherited'.
> > > Does Emacs indeed fail to wrap Arabic text? can you show an
> > > example?
> >
> > Character level wrapping still almost works down at Emacs 24.4, but
> > I don't know that it wasn't broken in later enhancements. There
> > are three features that make me think Emacs 24.4 might be different
> > to the current state of affairs:
> >
> > (1) Clicking into the text breaks text before the cursor, but not
> > after it.
> > (2) I can't step into lam-alif the way I step into Indic clusters.
> > (3) Lam-alif isn't broken by line wrap.
>
> Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27
> instead, it has several bugs in this area fixed, and will use HarfBuzz
> if available at build time.
The behaviour in 27.05 is the almost the same as for 24.4, but the
breaking in item (1) is automatically repaired. The process seems slow
- I can see the glyph become final and then revert back to being
medial. I'm puzzled by not being able to step into lam-alif but being
able to step through a series 'beh's. The step into command for
advancing codepoint by codepoint semiworks. The cluster shaping
doesn't break at the cursor - Handa gave me a C code fix so I could
achieve that - but the number of steps into to pass through a cluster
matches the number of codepoints.
Pressing the 'delete' key still deletes a single character, but may be
that because it's mapped to tpu-delete-current-char.
So, what's not working in Arabic is that one can't move the cursor
through ligatures. It seems one can advance point through them
using a step-into command (dead reckoning is a useful fallback), but one
loses visual feedback. But for that important matter, it looks as
though Arabic in Emacs already has the behaviours needed for shaping
Latin words. The stepping into is enabled by the command "(setq
disable-point-adjustment t)".
Richard.
More information about the HarfBuzz
mailing list