[HarfBuzz] harfbuzz work

Thu Aug 6 12:47:20 PDT 2009

On 5 Aug 2009, at 22:40, Behdad Esfahbod wrote:

> Hi Jonathan and others,
>
> Finally I have some more code to show.  I actually made my branch  
> into pango's master branch, so harfbuzz-ng lives in pango's master  
> branch for now.

Great, I'll take a look soon.

A few comments below, in response to yours.

>
>>> * Provided a small HB-friendly cmap-reader (currently handles  
>>> formats
>>> 4 and 12 only).
>
> I thought a lot about whether we want to deal with cmap directly.   
> There are multiple reasons not to:
>
>  - fontconfig for example, can handle non-Unicode cmap's by calling  
> iconv,

Hmm. Interesting, though I wonder how many fonts in the wild have  
*only* a non-Unicode cmap.

>
>  - For characters not supported by the font, we need to ask the  
> higher level what to do.  Pango uses special code that are used to  
> draw hexboxes later.

That's not a concern for me, as we already detect unsupported  
characters (as a side-effect of the font-matching process). So we know  
about them already, and draw hexboxes for them independently of  
Harfbuzz.

>>>
>>> * Code to look up the Unicode character properties we're likely to
>>> need; currently script, bidi direction, and arabic joining type.  
>>> This
>>> can be retrieved from the ICU property APIs, if the client is using
>>> ICU anyway, or there's a local implementation supporting just the
>>> properties needed in the layout process. Actually, as we don't do  
>>> bidi
>>> within HarfBuzz, I'm not sure we need that property; on the other
>>> hand, we may need character types (combining marks, etc) for cluster
>>> handling - I haven't looked into that yet.
>
> Again, these all be taken care of by what I'm currently calling  
> hb_unicode_callbacks_t.  For testing, I'd rather use glib's instead  
> of having scripts to extract them in yet another place.

I'm using (testing) this within Gecko, where we don't have glib, so I  
needed something else. I'm not saying this has to be the final word,  
but I needed something to drive the Arabic shaper, for example, as  
well as for the script itemizer (which I need whether or not it  
becomes part of harfbuzz).

>
>>> * Proposed shaping-function API (see hb-shaper.h) and two shaper
>>> implementations (generic and arabic/syriac/n'ko). These support
>>> user-specified features in addition to the defaults and
>>> script-specific shaping features. Oh, they also handle mirroring  
>>> using
>>> the OMPL table, and apply ltra/rtla etc according to direction.
>
> Thanks.  I get to them soon.  Regarding OPML, I'm of mixed mind.  I  
> personally prefer to use the latest Unicode mirroring properties  
> instead.  The idea of fixing on OPML was stupid IMO.

I don't like it, either, but I'm inclined to support the standard as  
written.

>
>
>>> In the shaper API that I'm using right now, the approach is to
>>> initially fill the buffer with *character* codes, and the shaper
>>> function takes a pointer to a cmap table in addition to the layout
>>> record. I did this because shaping needs access to the Unicode  
>>> values,
>>> not just the glyphs. I suppose we could specify that the cmap table
>>> can be NULL, in which case the buffer is assumed to contain glyph  
>>> IDs
>>> already, but this will make most complex-script shaping impossible.
>>> (Actually, it's a problem even for the generic shaper, as it needs  
>>> the
>>> Unicode character codes for mirroring.)
>
> That's kinda what I have in mind, yes.  I'm actually think of  
> hb_shape() calling the following four functions:
>
>  hb_substitute_default()  -> does cmap conversion
>  hb_substitute_complex()  -> does GSUB substitution
>
>  hb_position_default()    -> does default glyph-metrics positioning
>  hb_position_complex()    -> does GPOS positioning
>
> Better naming welcome.

Seems like a reasonable approach. You'll see that in the code I sent,  
I didn't actually do the default positioning step at all, so the  
result only contains the GPOS deltas. (Basically, I was wanting to  
check that the GPOS was being processed properly.)

>>> One outstanding issue is passing parameters to features like 'aalt'
>>> (alternate substitution lookups). I see you have a "placeholder"  
>>> for a
>>> callback function in AlternateSubstFormat1::apply, but this doesn't
>>> look quite sufficient AFAICT. In order to return the proper index,  
>>> the
>>> function would need to know which feature is currently being
>>> processed, which is information that is not available at this  
>>> level of
>>> applying the lookup. (Note that it would be possible for a run of  
>>> text
>>> to have several Alternate features applied, with different indexes
>>> used for each of them.)
>
> I'm finally convinced that we don't want a callback approach.  I  
> think I have something in mind that may work.  The idea being, the  
> mask for the feature can have more than 1 bit on, and we use those  
> bits of the glyph property as a selection.  Makes sense?

Yes, this sounds like what I was trying to suggest. The complex-script  
shapers need to use some mask bits to control feature application  
(I've been working on an Indic shaper, too), so I've been allocating  
these from the high end of the mask, with the idea that the low bits  
can be used as an index by AlternateSubstFormat1 lookups. So a user- 
controlled feature is normally turned "on" by setting the low bit, but  
in the case of AltSubFmt1, setting any non-zero value in the low byte  
of the mask serves to not only turn it on but also select a  
substitution.

JK