[HarfBuzz] arabic presentation-forms shaping

Behdad Esfahbod behdad at behdad.org
Wed Apr 4 11:23:31 PDT 2012

Hi Jonathan,

Interesting.  In fact, I had to port my Arabic fallback-shaping logic from
FriBidi to Python last week and in the process was considering different
designs for HarfBuzz.

So, before I had two designs in mind:

1. Synthesized GSUB of *Unicode characters*, with a modified layout engine
that checks cmap before substituting.  This is what iculayout does.  I used to
like this approach, but I'm not a huge fan anymore because I think it's too
limiting and intrusive.

2. Just do it in a callback in the right place in the complex code.

Now your approach is new to me.  And a very interesting one in fact.  I'm not
a huge fan however, for very simple reason: I don't like dynamic memory
allocation in hb and want to avoid it as much as possible.  When thread-safety
stuff comes in, it gets even uglier.

Given that the code involved to do a the fallback is around 20 lines at this
point, I'm leaning towards just doing it.  Let me give this a try today.

What fonts are you using for testing?


On 04/04/2012 12:29 PM, Jonathan Kew wrote:
> Hi Behdad,
> I was thinking about the issue of shaping legacy Arabic fonts using
> presentation-forms codepoints, when no GSUB features are available.
> ISTM that the hb_substitute_complex_fallback() placeholder currently in
> hb_ot_shape_execute_internal isn't really the right place to do this, because
> by that time hb_map_glyphs has replaced the original character codes with
> glyph IDs, so doing shaping directly in terms of Unicode codepoints isn't a
> convenient option any longer.
> An alternative I'd like to suggest is that, to minimize the amount of shaping
> code that needs to be maintained in two places or forms - one for shaping
> using the OpenType features and GSUB lookups, and one based on Unicode
> character codes - we could handle fonts like this by synthesizing an
> appropriate GSUB table on the fly, and then simply using the existing OT
> shaping code.
> To see how feasible this would be, I've hacked up some code to synthesize a
> basic Arabic GSUB, implementing the init/medi/fina/isol/rlig features. I'll
> attach it for your ... review? amusement? It seems to work fine, at least with
> the couple of fonts I've tried.
> What's still needed is to figure out a nice way to integrate this - if we're
> going to supply a synthetic GSUB, this need to be done up front, before the
> shaping plan is compiled, so that the proper table is used both to collect the
> lookups and then to execute them. Maybe some kind of
> complex_shaper_preflight() call, where the shaper gets the chance to probe the
> font's table(s) and see if they support the script it requires, and provide an
> alternative if desired. This would happen before compiling the feature maps.
> (Note that we'll need to do this table-replacement dynamically depending on
> the script of the buffer being shaped, as the font might have a GSUB that
> supports Latin script but not Arabic, for example.)
> - JK

More information about the HarfBuzz mailing list