[HarfBuzz] Ligatures

Sat May 23 15:54:51 UTC 2020

On Sat, 23 May 2020 17:22:58 +0300
Eli Zaretskii <eliz at gnu.org> wrote:

> > Date: Sat, 23 May 2020 14:51:53 +0100
> > From: Richard Wordingham <richard.wordingham at ntlworld.com>
> >   
> > > > They may of course have more than one set of such rules, with
> > > > the rule sets defining different sets of sequences.    
> > > 
> > > Who are "they" in this context?  
> > 
> > Devanagari and Tai Tham are two examples I am aware of.  
> 
> Emacs supports more than one rule for each composable sequence of
> characters.

That doesn't help when the rules give conflicting divisions into
clusters, which is the case with Tai Tham.

On the other hand, for the Devanagari scripts, the rules can store
alternatives which some renderers would consider ill-formed, or be
more sensibly treated as 2 clusters.  

> > Devanagari has different rules for positioning of Vedic marks
> > between fonts using the script tags dev and dev2 for it on one hand
> > and the unofficial script tag dev3, which follows the USE rules for
> > character ordering.  For tag dev, Microsoft says that <consonant,
> > virama, candrabindu, consonant> is one cluster; others, including
> > Unicode, say it's two.  Candrabindu in the middle and candrabindu
> > at the end mean different things; the former nasalises a consonant,
> > while the latter nasalises a vowel.  The visual distinction exists,
> > at least when half-forms are used.  
> 
> See the rules set up near the end of indian.el in Emacs.  If they
> don't cover what you describe, we can add more.

The Devanagari rule only covers the Vedic marks in the Devanagari block,
the 'stress signs' according to the comments.  Can rules essentially
for different scripts now share combining marks?  The newer Vedic marks
were supposed to be available to at least all Indian Indic scripts.

> > > If a font requires special shaping for any sequence of any number
> > > of 26 (or maybe 52) ASCII letters, then the Emacs display engine
> > > will need to be redesigned.  So this extreme possibility doesn't
> > > bother me.  
> > 
> > In general, they do require it.  But how is this worse than handling
> > Arabic?  
> 
> I don't know.  Maybe it isn't.  Or maybe the slowdown while displaying
> ASCII and moving the cursor through it will be unbearable.
> 
> > Is the problem that you want to keep the option of line
> > wrapping splitting words for ASCII, but are not bothered for Arabic
> > or other human languages?  
> 
> Does Emacs indeed fail to wrap Arabic text?  can you show an example?

Character level wrapping still almost works down at Emacs 24.4, but I
don't know that it wasn't broken in later enhancements.  There are three
features that make me think Emacs 24.4 might be different to the
current state of affairs:

(1) Clicking into the text breaks text before the cursor, but not after
it.
(2) I can't step into lam-alif the way I step into Indic clusters.
(3) Lam-alif isn't broken by line wrap.

> > I think you mean that Emacs would store the position of components
> > by an index that was the sequence of characters, not the glyph ID.
> > That would also deal with precomposed characters - it would be the
> > character sequence that mattered, and for cursor movement and
> > rendering, the canonically equivalent sequence(s) and the
> > precomposed character would remain distinct.  
> 
> Sorry, I don't follow: what do you mean by "store"?  Emacs stores the
> rules used to compose characters, and it stores the results of the
> compositions already done by applying those rules, as part of
> displaying some chunk of text.  Which one of these did you have in
> mind?

Neither.  I thought from the Emacs developers' discussion that you were
hoping to store the locations of the character boundaries within
ligatures. 

Richard.