[HarfBuzz] On fallback shaping and future directions

Khaled Hosny khaledhosny at eglug.org
Mon Nov 22 11:44:24 PST 2010


On Mon, Nov 22, 2010 at 01:52:57PM -0500, Behdad Esfahbod wrote:
[...]
>   - Missing GPOS:  Many Arabic fonts, specially those from Microsoft, do not
> have a GPOS table.  This could all be just fine, except that many of them do
> not have zero advance width for the marks either.  So, circumvention is
> necessary.  Pango used to have API to simply zero advance width of marks and
> the Arabic module simply used that API.  Note that it's in general not safe to
> zero advance width of all combining marks as there are legitimate cases for
> marks with positive advance width.  I'll cover this later in the non-Arabic
> discussion, but worst case, we can add a post-positioning hook and use it in
> the Arabic module to zero mark advances.

I've seen fonts (not sure if it were MS fonts) that still do mark
positioning using the GSUB 'mset' feature, so such case need to be
considered.

>   - Missing GSUB:  Such fonts can be handled by fallback to using the
> presentation forms encoded in Unicode, but since Pango never did that, I
> wouldn't consider it a high priority, or even something that we should support
> ever.

This can be useful addition, though. Lucida fonts (distributed with
certain version of Java) lack GSUB table, and one of the most common
complaints after installing SUN's JRE is broken Arabic text on web
pages. Both Qt and OpenOffice (ICU?) can shape such fonts (falling back
to presentation forms, I assume), so it would be nice if future versions
of Pango/HB can do that.

[...]
>   - ZWNJ/ZWJ, etc:  Pango also used to remove these characters from the glyph
> stream.  Not actually removing them in fact, but replacing them with "empty"
> glyphs.  That's ok for Arabic, but in general we should handle those two
> special characters much better.  In particular, the OpenType engine should
> simply ignore ZWJ when forming ligatures.  In fact, a wishful reading of the
> Unicode and OpenType specs suggests that maybe we should turn 'dlig' feature
> on for characters before ZWJ.  How does that sound?  Regardless, we have to
> get to the Indic shaper to see what exactly we have to do with these two.

I think there should be some way to optionally retain the glyphs for
ZWNJ/ZWJ (and directional formating characters) as it can be very useful
when editing text which uses formating characters frequently. That can
be used by UI toolkits to enable/disable a 'show control characters'
mode.

[...]
>   - Incorrect GDEF: For sure GDEF is an absolutely necessary table for any
> well-crafted OpenType font, not the least because of the mark attachment
> classes and mark glyph sets.  But the regular glyph classes are of much less
> importance that they initially suggest.  In particular, we only care about
> mark vs non-mark classes.  I wonder if we should use Unicode general category
> to *adjust* GDEF mark classes.  That is, always mark a glyph as a mark class
> if general category suggests so, even if the GDEF class says non-mark.

AFAIK Uniscribe does that, but I know at least one font developer who is
cursing MS day and night just because of that (he is building Arabic
fonts, if it makes any difference.) If we ignore GDEF table in such case
we leave the font developer with no way to override standard glyph
classes even if he intentionally and purposefully wants to do so.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer



More information about the HarfBuzz mailing list