[HarfBuzz] Possible failed positioning in Harfbuzz - and Uniscribe

tom.programs at gmail.com tom.programs at gmail.com
Wed Oct 9 14:39:44 PDT 2013


OK, after an enjoyable detour into the source code I found it is not a "bug", but rather a "feature" of Harbuzz. In fact the code responsible for that behaviour is in "hb-ot-layout-gpos-table.hh", line 1034 (function OT::MarkBasePosFormat1::apply):

      /* We only want to attach to the first of a MultipleSubst sequence.  Reject others. */
      if (0 == get_lig_comp (c->buffer->info[skippy_iter.idx])) break;

and the revision where the code was inserted is marked as:

     This is apparently what Uniscribe does.  Test case is:

       SEEN FATHA TEH ALEF

     with Arabic Typesetting.  Originally reported by Khaled Hosny.

This explains why I noticed this even in Uniscribe... This code, basically, rejects applying a mark to a base that is not the first element that resulted from a multiple substitution that happened before, and instead tries to apply the mark to the first element of said substitution.
However, in my case, this is not the desired behavior. I am writing Biblical and liturgical Hebrew. One of the most complete fonts to implement a proper treatment of the complex Hebrew diacritics is SBL Hebrew. Its layout intelligence was made open source, so there are many fonts (see the culmus project) that implement it. However, in some particular cases (*), because of the way the font was designed, in Harfbuzz the positioning behavior results in a misalignment of the Hebrew letters and their diacritics. 
Moreover, in Uniscribe the rendering is actually different than in the current version of Harfbuzz (and, in my eyes, strange...). In Adobe InDesign, Fontforge, Mellel and ConTeXt Mark IV the rendering of SBL Hebrew is correct IMHO. Just to provide a visual feedback I am attaching screenshots of the rendering of the string קיָ֦גג in Uniscribe (Wordpad), Mellel (the most correct), the original version of Harfbuzz and a version of Harbuzz where I edited OT::MarkBasePosFormat1::apply.

I understand that the way Harfbuzz works is because of compatibility with Uniscribe, however I would just like to know if this positioning behavior is a decision for the development of Harfbuzz, or something that is open to be changed in the future.


(*) Namely, in the string "קיָ֦"= qof+vav+qamats+merkha kefula" SBL Hebrew substitutes "hairspace+vav" for the vav after the qof, and then applies the "qamats+merkha" diacritics to the yod. Harfbuzz tries to apply it to the hairspace instead of the yod, and fails.

---

Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wordpad.png
Type: image/png
Size: 7910 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131009/48152abb/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mellel.png
Type: image/png
Size: 9659 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131009/48152abb/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: harfbuzz-original.png
Type: image/png
Size: 2418 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131009/48152abb/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: harfbuzz-modified.png
Type: image/png
Size: 2415 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131009/48152abb/attachment-0003.png>
-------------- next part --------------

Il giorno 07/ott/2013, alle ore 20.42, Rolf Langenhuijzen ha scritto:

> Hi Tom,
> 
> If I try your rtf with a simple test then it looks OK to me (see png).
> hb-view --output-file=m.png --font-size=100 minimal.ttf aoèièi
> 
> this is hb 0.9.21 with ot shaper
> 
> Rolf
> <m.png>
> 
> On Oct 7, 2013, at 1:13 AM, tom.programs at gmail.com wrote:
> 
>> I am just beginning to try Harfbuzz, but I am writing to you because I think that I might have found incorrect behavior when I have both a contextual chained substitution and a contextual chained positioning.
>> 
>> The problems occur when I have the following two rules:
>> 1 Substitute ["e"] with ["o" "e"] when preceded by an "a" (context: { ["a"]  |  } )
>> 2 Position the mark ["gravecomb"] anchoring it to the ["e"] when the mark is followed by an "i" (context: {  | "i"  } )
>> 
>> What I think I should see when I type ["a" "e" "gravecomb" "i" "e" "gravecomb" "i" ] should be something like [aoèièi] 
>> What I see is more like [aoeˋièi] (the first "gravecomb" is not anchored to the "e")
>> I used the characters "a", "e", "i", "o", "gravecomb" (U+0300) but the problem is not specific to those characters and persists even in right to left scripts. I found while examining the font SBLHebrew and the string "קוָ֣".
>> 
>> I built a very minimal font that reproduces this problem with the latin characters I used for the example. I put online the Fontforge source <https://www.dropbox.com/s/a78cypqv3jgmaex/prova.sfd> and the ttf <https://www.dropbox.com/s/5hq1c5mdg4isvzo/minimal.ttf>
>> 
>> However, the fact that the problem is reproduced almost exactly on Uniscribe, and even in the Proofing tool of MS VOLT makes me wonder if it is a bug or not. The problem is not present on the shaping system of ConTeXt Mark IV and on Apple's TextEdit, so it is even more mysterious for me.
>> 
>> I also put the link of the (IMHO correct) rendering of Fontforge <http://s23.postimg.org/8w44n9b3v/Screenshot_from_2013_10_07_00_55_45.png> and of the rendering of hb-view <http://s14.postimg.org/p02lzc29t/Screenshot_from_2013_10_07_00_58_32.png> (in order to render it with "hb-view --language=dflt --features="calt,kern" '/home/mint/Desktop/minimal.ttf' aèièi", be aware that the è is composed of two characters,  U+0065 and U+0300, because the software tends to convert this sequence to the single U+00E8 character). The problem is not with the spacing (in my font the "gravecomb" has nonzero width, but it's a mark, so its width is somewhat undefined) but with the fact that the first accent is not attached to the first "e".
>> 
>> --
>> Tom
>> _______________________________________________
>> HarfBuzz mailing list
>> HarfBuzz at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
> 



More information about the HarfBuzz mailing list