[HarfBuzz] Unified Text Layout Engine?

Thu Feb 1 12:20:48 PST 2007

Am 01.02.2007 um 20:38 schrieb Eric Mader:

> Hi Behdad,
>
> The ICU LayoutEngine uses an abstract base class to represent fonts  
> and so is independent of any particular font format or OS. (Though  
> the model in the base class assumes that a font contains tables w/  
> four-byte names. ;-)
>
> I've thought for some time that the Indic shaper code in ICU is too  
> fragile. When I wrote it, I though I could get away with a single  
> routine to analyze and tag all Indic scripts. Since then, I've  
> found out that there are lots of script-specific exceptions and  
> it's sometimes hard to fix one script without breaking any of the  
> others...
>
> A couple of years ago I talked w/ Owen Taylor about rewriting the  
> code to have a (potentially) different shaper for each script. At  
> that time I thought that almost all the bugs were fixed and it  
> wasn't worth the effort. Subsequent experience has shown this  
> belief to be optimistic. :-)

So if neither Pango nor ICU like their own Indic shapers, maybe Qt  
can provide
theirs? Is their any news regarding copyright transfer?

> As you probably know, Microsoft changed the spec. for Indic  
> OpenType fonts for Vista. As soon as they publish the new spec.  
> we'll  have to adapt our shaper to deal with the new fonts. I don't  
> know all the details of what's required yet, but it looks like the  
> shaper may have to "probe" the 'GSUB' table w/ trial lookups to  
> determine, for example, which characters have pre- and post-base  
> forms.

Does MS have a rationale for this change?

> The ICU LayoutEngine has a few other tricks in it that HarfBuzz  
> might want. For example, it uses a "canned" 'GSUB' table to do  
> presentation form based shaping of Arabic text if there's no 'GSUB'  
> table in the font.

Sounds cool.

> There's some similar code the deal with canonical forms. For  
> example, if the input text contains "a" followed by umlaut and the  
> font contains an a-umlaut glyph, it will substitute that. Also, if  
> the input text contains an a-umlaut character and the font does not  
> have a glyph for a-umlaut, it will substitute an "a" followed by an  
> umlaut. This produces better rendering of the "basic" scripts if  
> there's no 'GSUB' table.

That's also a canned GSUB table?

I think canned tables would be easy to integrate with HarfBuzz since
presumely they don't contain ICU specific data structures...?

/Andreas