[HarfBuzz] Order of combining diacriticals

Eli Zaretskii eliz at gnu.org
Thu Jun 13 09:18:34 UTC 2019


> Date: Wed, 12 Jun 2019 22:24:12 +0200
> From: Khaled Hosny <dr.khaled.hosny at gmail.com>
> Cc: harfbuzz at lists.freedesktop.org
> 
> On Wed, Jun 12, 2019 at 10:22:48PM +0300, Eli Zaretskii wrote:
> > In Emacs, we use HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES cluster
> > level, because HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS produced
> > incorrect display.
> 
> The cluster levels shouldn’t affect display, the glyph positions are
> exactly the same for all the three:

Thanks, I guess I was misremembering something I've read in the
HarfBuzz docs.

> >   U+05D1 HEBREW LETTER BET
> >   U+05B0 HEBREW POINT SHEVA
> >   U+05BC HEBREW POINT DAGESH
> 
> > 
> > I need to type them in the above order; if I type DAGESH before SHEVA,
> > the produced display is incorrect.
> 
> The glyph order and positions are the same regardless of the input order
> (which is what I’d expect since HarfBuzz normalizes mark order), the
> only difference is cluster values which is also expected AFICT:
> 
> $ hb-shape NotoSerifHebrew-Regular.ttf --unicodes="U+05D1,U+05B0,U+05BC" --cluster-level=1
> [uni05B0=1 at 178,0+0|uni05BC=1 at 153,0+0|uni05D1=0+539]
> 
> $ hb-shape NotoSerifHebrew-Regular.ttf --unicodes="U+05D1,U+05BC,U+05B0" --cluster-level=1
> [uni05B0=2 at 178,0+0|uni05BC=1 at 153,0+0|uni05D1=0+539]
>  
> > Is this expected with level-0 clusters?  Or should I look for a bug in
> > how Emacs uses HarfBuzz?
> 
> Might be a result of hb_buffer_reverse_clusters() used by Emacs.

Since we work on cluster level 0, there's only one cluster in this
case, no matter what is the order of the characters in the original
text.  So cluster reversal cannot (and does not) have any effect here.

The problem was a different one.  The puzzle had two parts:

  . I used the Courier New font, which evidently doesn't have the
    'hebr' OTF features in its GSUB and GPOS tables.  If I use a font
    that does have those features, e.g., Symbola, the problem doesn't
    happen.

  . For fonts that have no 'hebr' features, Emacs performs
    substitution of known precomposed characters before it invokes the
    shaping engine.  In this case, it substituted U+FB31 for the
    sequence U+05D1,U+05BC, and passed the sequence U+FB31,U+05B0 to
    HarfBuzz.

It turned out there was a subtle bug in the code which uses the
information returned by HarfBuzz, which is triggered by this use case:
the TO value of the LGLYPH object was computed in a way that confused
the Emacs display engine.  Fixing the logic in that case resolved the
problem.

Thanks.


More information about the HarfBuzz mailing list