[HarfBuzz] Unified Text Layout Engine?

Fri Feb 2 05:59:35 PST 2007

On Thursday 01 February 2007 21:20, Andreas Vox wrote:
> Am 01.02.2007 um 20:38 schrieb Eric Mader:
> > Hi Behdad,
> >
> > The ICU LayoutEngine uses an abstract base class to represent fonts
> > and so is independent of any particular font format or OS. (Though
> > the model in the base class assumes that a font contains tables w/
> > four-byte names. ;-)
> >
> > I've thought for some time that the Indic shaper code in ICU is too
> > fragile. When I wrote it, I though I could get away with a single
> > routine to analyze and tag all Indic scripts. Since then, I've
> > found out that there are lots of script-specific exceptions and
> > it's sometimes hard to fix one script without breaking any of the
> > others...
> >
> > A couple of years ago I talked w/ Owen Taylor about rewriting the
> > code to have a (potentially) different shaper for each script. At
> > that time I thought that almost all the bugs were fixed and it
> > wasn't worth the effort. Subsequent experience has shown this
> > belief to be optimistic. :-)
>
> So if neither Pango nor ICU like their own Indic shapers, maybe Qt
> can provide
> theirs? Is their any news regarding copyright transfer?

We are absolutely willing to do so. It's not even a legal issue here, just a 
matter of finding the time to do the work. Things have unfortunately been a 
bit more busy than we expected here in the last month.

Simon and myself just need to find a few days of time to sit down and do the 
actual work. 

> > As you probably know, Microsoft changed the spec. for Indic
> > OpenType fonts for Vista. As soon as they publish the new spec.
> > we'll  have to adapt our shaper to deal with the new fonts. I don't
> > know all the details of what's required yet, but it looks like the
> > shaper may have to "probe" the 'GSUB' table w/ trial lookups to
> > determine, for example, which characters have pre- and post-base
> > forms.
>
> Does MS have a rationale for this change?
>
> > The ICU LayoutEngine has a few other tricks in it that HarfBuzz
> > might want. For example, it uses a "canned" 'GSUB' table to do
> > presentation form based shaping of Arabic text if there's no 'GSUB'
> > table in the font.
>
> Sounds cool.
>
> > There's some similar code the deal with canonical forms. For
> > example, if the input text contains "a" followed by umlaut and the
> > font contains an a-umlaut glyph, it will substitute that. Also, if
> > the input text contains an a-umlaut character and the font does not
> > have a glyph for a-umlaut, it will substitute an "a" followed by an
> > umlaut. This produces better rendering of the "basic" scripts if
> > there's no 'GSUB' table.
>
> That's also a canned GSUB table?
>
> I think canned tables would be easy to integrate with HarfBuzz since
> presumely they don't contain ICU specific data structures...?

Probably. Doing Arabic and ligatures by canned tables is a nice way of 
handling them. It avoid having two separate codepaths for opentype and non 
opentype fonts. Once we have the shapers in harfbuzz it should be rather 
simple to add some canned tables for default handling.

Cheers,
Lars