[HarfBuzz] Lemongrass HarfBuzz Hackfest, end of day 4

Behdad Esfahbod behdad at behdad.org
Mon Jul 23 05:57:40 PDT 2012


On 07/23/2012 05:57 AM, Rajeesh K Nambiar wrote:
> On Mon, Jul 23, 2012 at 9:45 AM, Behdad Esfahbod <behdad at behdad.org> wrote:
>> On 07/23/2012 12:05 AM, Rajeesh K Nambiar wrote:
>>> Just tested your fix and the crash is gone. The only remaining issue
>>> with Malayalam is with pre-base 'Ra', RA+ZWJ and dot Reph.
>>
>> We know about dot-reph, have not implemented it yet (but should be easy).  But
>> pre-base Ra should work.  Can you send me a sequence?
> 
>>From the in-tree test file, test cases 15, 19, 20, 53 and 54 all
> involve 'Ra'. Case 5 (കാര്‍ക്കോടകന്‍) involves 'RA+ZWJ' in the middle
> of the word. At present all are failing (interestingly, case 5 was
> fine during the crash ;-)).
> 'RA' in Malayalam needs to be treated as if it is post-base. I'm
> attaching images a simple test case for 'ക്ത്ര' using Rachana font
> (hb-view and pango-view outputs).

This seems to be a font problem.  Ok.  Here is the sobering reality:

With HarfBuzz Indic shaper, our goal is to track Uniscribe shaping as closely
as possible (while it makes sense).  The older shapers (Pango, Old HarfBuzz,
ICU, etc), each had their differences from Uniscribe and from eachother.
Unfortunately, this meant that fonts designed against them (as opposed to
fonts designed against Uniscribe) may be broken now.

Testing is simple: if HarfBuzz agrees with Uniscribe, it's a font bug.  If
HarfBuzz disagrees with Uniscribe, it's a HarfBuzz bug.

We have to make this distinction, otherwise 1) it's impossible to know whether
it's a font or shaper bug, and 2) there will be no spec to track and any
shaper we come up with will be quite arbitrary.

In the case of 'ക്ത്ര' and Rachana, while I clearly see that our result is
wrong, Uniscribe agrees with us, so it out to be a bug in the way the font
lookups are organized.


> Meanwhile, the latest commits segfaults again like datao zhang reported.

Fixed already.


> Also noticed that post-base 'LA' is not correctly rendered for 'സ്പ്ലേ'.

This one is more interesting.  And we differ from Uniscribe, and I see why.

Jonathan, this is what's happening:

Both Rachana and Raghu have 'half' lookups that subtitute C,H sequences with
glyphs that are essentially ligatures of explicit halant on the consonant (not
half forms really, from what I can see).  As such, our L-matra repositioning
logic positions matras to the left of such glyphs.  In other words, since
there is no explicit Halant glyph, matra is not repositioned.  This is exactly
what the spec says, and works for Devanagari.  But Uniscribe seems to move it
anyway.

Is it the case that Malayalam does not have half forms?  If that is the case,
that would explain, and we can adjust this.  What other scripts do not have
half forms BTW?

Cheers,
behdad



More information about the HarfBuzz mailing list