[HarfBuzz] On fallback shaping and future directions

Behdad Esfahbod behdad at behdad.org
Mon Nov 22 12:05:38 PST 2010


On 11/22/10 14:44, Khaled Hosny wrote:
> On Mon, Nov 22, 2010 at 01:52:57PM -0500, Behdad Esfahbod wrote:
> [...]
>>   - Missing GPOS:  Many Arabic fonts, specially those from Microsoft, do not
>> have a GPOS table.  This could all be just fine, except that many of them do
>> not have zero advance width for the marks either.  So, circumvention is
>> necessary.  Pango used to have API to simply zero advance width of marks and
>> the Arabic module simply used that API.  Note that it's in general not safe to
>> zero advance width of all combining marks as there are legitimate cases for
>> marks with positive advance width.  I'll cover this later in the non-Arabic
>> discussion, but worst case, we can add a post-positioning hook and use it in
>> the Arabic module to zero mark advances.
> 
> I've seen fonts (not sure if it were MS fonts) that still do mark
> positioning using the GSUB 'mset' feature, so such case need to be
> considered.

We apply 'mset' by default, so that's not an issue.


>>   - Missing GSUB:  Such fonts can be handled by fallback to using the
>> presentation forms encoded in Unicode, but since Pango never did that, I
>> wouldn't consider it a high priority, or even something that we should support
>> ever.
> 
> This can be useful addition, though. Lucida fonts (distributed with
> certain version of Java) lack GSUB table, and one of the most common
> complaints after installing SUN's JRE is broken Arabic text on web
> pages. Both Qt and OpenOffice (ICU?) can shape such fonts (falling back
> to presentation forms, I assume), so it would be nice if future versions
> of Pango/HB can do that.

Right.  I used to get reports re Lucida as shipped by Sun JRE regularly before
but it slowed down in recent couple years.  Arabic fallback to presentation
forms is an afternoon's hacking, so it won't be a big deal.  I'll keep it in mind.


> [...]
>>   - ZWNJ/ZWJ, etc:  Pango also used to remove these characters from the glyph
>> stream.  Not actually removing them in fact, but replacing them with "empty"
>> glyphs.  That's ok for Arabic, but in general we should handle those two
>> special characters much better.  In particular, the OpenType engine should
>> simply ignore ZWJ when forming ligatures.  In fact, a wishful reading of the
>> Unicode and OpenType specs suggests that maybe we should turn 'dlig' feature
>> on for characters before ZWJ.  How does that sound?  Regardless, we have to
>> get to the Indic shaper to see what exactly we have to do with these two.
> 
> I think there should be some way to optionally retain the glyphs for
> ZWNJ/ZWJ (and directional formating characters) as it can be very useful
> when editing text which uses formating characters frequently. That can
> be used by UI toolkits to enable/disable a 'show control characters'
> mode.

Right.  That's also something I have to figure out.



> [...]
>>   - Incorrect GDEF: For sure GDEF is an absolutely necessary table for any
>> well-crafted OpenType font, not the least because of the mark attachment
>> classes and mark glyph sets.  But the regular glyph classes are of much less
>> importance that they initially suggest.  In particular, we only care about
>> mark vs non-mark classes.  I wonder if we should use Unicode general category
>> to *adjust* GDEF mark classes.  That is, always mark a glyph as a mark class
>> if general category suggests so, even if the GDEF class says non-mark.
> 
> AFAIK Uniscribe does that, but I know at least one font developer who is
> cursing MS day and night just because of that (he is building Arabic
> fonts, if it makes any difference.) If we ignore GDEF table in such case
> we leave the font developer with no way to override standard glyph
> classes even if he intentionally and purposefully wants to do so.

That's why I suggested that we only do so if there is no mark attachment classes.


> Regards,
>  Khaled


Thanks for the feedback,
behdad



More information about the HarfBuzz mailing list