[HarfBuzz] Hebrew composition to presentation forms

Scott Fleischman scott.fleischman at logos.com
Mon Feb 3 10:11:37 CET 2014

Regarding the standard, and as Jonathan mentioned, it is my understanding that the Hebrew presentation forms are in the Composition Exclusion Table. http://www.unicode.org/reports/tr15/#Primary_Exclusion_List_Table

Quoting that URL:
There are four classes of canonically decomposable characters that are excluded from composition:

Script-specifics: canonically decomposable characters that are generally not the preferred form for particular scripts.

The Hebrew presentation forms fall under the script specifics. See http://www.unicode.org/Public/6.3.0/ucd/CompositionExclusions.txt

It's striking to me that the presentation forms are both "canonically decomposable", yet "generally not the preferred form", and "excluded from composition." I think that makes sense for the Hebrew presentation forms, because the vowel placement depends on the additional marks.

For instance, 05D0 05B8 makes sense as FB2F--there isn't a distinguishable difference. However, once you add meteg (05BD) into the mix: 05D0 05B8 05BD, I can't see how you can render it correctly once it is composed as FB2F 05BD. The vowel is in the wrong position now; it should be moved over to the right more so that there can be space for the mark under the consonant.

This situation occurs often enough. It seems that it would occur anywhere that a mark is in the same location as the vowel (such as below the consonant). For various examples, see: https://bugs.freedesktop.org/attachment.cgi?id=78621 and https://bugs.freedesktop.org/attachment.cgi?id=78614

Would the best way to avoid these compositions be adding a decompose function in hb-ot-shape-complex-hebrew.cc which decomposes the presentation forms when followed (or preceded) by other marks?

Would checking for GPOS be better for this purpose?

The more places I see HarfBuzz being used, it makes me sad to see these mark placement issues. I even have used it as a test to see whether they are using HarfBuzz behind the scenes. A friend was showing off how great Hebrew looked on the latest Android. And indeed it looked much better than earlier versions. Just earlier I had read that Android was using HarfBuzz, so I had him pull up Genesis 1:1. Sure enough, the meteg was out of place on the last word, looking more like https://bugs.freedesktop.org/attachment.cgi?id=78401 than https://bugs.freedesktop.org/attachment.cgi?id=78400

I hope there is a better solution than always composing these forms. I'd be happy with only composing when only the one vowel in the presentation form is present and no other marks. That seems to address the needs of older fonts yet not interfering with the placement of additional marks. I think that would only work within the Hebrew script though.


From: harfbuzz-bounces+scott.fleischman=logos.com at lists.freedesktop.org [harfbuzz-bounces+scott.fleischman=logos.com at lists.freedesktop.org] on behalf of Jonathan Kew [jfkthame at googlemail.com]
Sent: Sunday, February 02, 2014 1:45 AM
To: Khaled Hosny; Harfbuzz
Subject: Re: [HarfBuzz] Hebrew composition to presentation forms

On 2/2/14 01:13, Khaled Hosny wrote:
> Hi,
> Someone reported an issue with the hireq placement under the yodh with
> Ezra SIL font[1]. When I checked this, it seemed to be because HarfBuzz
> is composing U+05D9 + U+05B4 to U+FB1D and the font has a glyph for
> U+FB1D that has a not so good placement for hireq.
> I thought this composition is result of Unicode normalisation, so
> HarfBuzz is doing the right thing, but the comment in
> hb-ot-shape-complex-hebrew.cc:75 indicates otherwise. I’m no very sure,
> but I feel this kind of composition should fits more into fallback
> shaping like done with Arabic and not something to be done
> unconditionally, WDYT?

This is a difficult call. Note that U+FB1D does have a canonical
decomposition to <U+05D9, U+05B4>; the comment in
hb-ot-shape-complex-hebrew.cc relates only to the fact that these Hebrew
presentation forms are excluded from the composition rules for NFC;
thus, both NFC and NFD representations use the decomposed sequence.

Nevertheless, the two representations *are* canonically equivalent, and
therefore it's appropriate that they should be rendered the same.

IMO, this is a font bug in Ezra SIL; if a font has positioning rules for
yod + hireq, and also has a precomposed yod-hireq glyph, the two should
look identical. A font that gives the impression that entering U+FB1D
will result in one appearance, while <U+05D9, U+05B4> will result in
something different, is misleading its users.

As for what harfbuzz should do: currently, it deliberately uses the
precomposed Hebrew presentation-form glyphs, because there are many
(generally older) fonts out there that lack good (or any) mark
positioning rules, and so decomposed sequences look terrible. Using the
presentation forms gives a much better result.

However, perhaps we should try to be more sophisticated, and do
something like "compose to the presentation forms if the font doesn't
have GPOS mark positioning; otherwise prefer decomposed sequences".


HarfBuzz mailing list
HarfBuzz at lists.freedesktop.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20140203/efb4d464/attachment.html>

More information about the HarfBuzz mailing list