[HarfBuzz] ICU LayoutEngine "Canned" GSUB Tables
emader at icu-project.org
Tue Jul 10 13:08:53 PDT 2007
The ICU LayoutEngine uses "canned" GSUB and GDEF tables to process
Arabic text if the font doesn't have a GSUB table that covers the Arabic
script. These tables use the Unicode code points instead of glyph ID.
The tables are generated by an ICU4J tool that uses the Unicode
character properties to identify ligatures and their components.
The character to glyph conversion is 1 to 1 and there needs to be a
"real" character to glyph conversion after GSUB processing. Ligature
substitution makes sure that the font actually contains the ligature
presentation form before forming the ligature and multiple substitution
makes sure that the font contains the component characters before
performing the substitution. This is done in ICU by passing an optional
"filter" object into GSUB processing. This object looks for the
characters in the font's CMAP table.
In ICU, the shaper that does this is a subclass of the OpenType Arabic
shaper. It references the canned GSUB and GDEF tables instead of the
tables from the font and reimpliments the character to glyph and post
The same GSUB and GDEF tables are used for ICU's canonical processing.
This processing is intended to produce better display results for fonts
that may have a limited repertoire. For example, if the input text
contains "a" followed by umlaut, an a-umlaut character will be
substituted if it's present in the font. Also, if the input text
contains an a-umlaut character and the font doesn't have a glyph for it,
it will be replaced by an "a" followed by an umlaut.
I spent some time on Friday morning at the summit looking at how to
integrate this functionality into the HarfBuzz Arabic shaper. The
obvious thing that needs to be added is the filter. The low-level GSUB
routines will need to take an optional filter that can be used for
ligature substitution and multiple substitution.
I thought that maybe the canned tables could be made available by
hacking the code that looks up the tables to just return the canned ones
if font doesn't have a "real" one. This won't really work though. For
one thing, we should use the canned tables if the font contains a GSUB
table that doesn't cover the Arabic script. Also, the caller needs to
know if the substitution happened so that it can pass in the correct
filter and do the right character to glyph mapping. I'll have to spend
some more time studying this.
IBM GCoC - ICU Team
More information about the HarfBuzz