[HarfBuzz] How various HarfBuzz OpenType shapers word (was Re: Tai Tham Shaping Question #2 : MEDIAL RA

Behdad Esfahbod behdad at behdad.org
Fri Dec 28 22:41:29 PST 2012


On 12-12-26 08:02 PM, Andrew Cunningham wrote:
> Hi Behdad,

Hi Andrew, everyone,

Sorry for being slow on the list or on fixing bugs.  I'm in Mexico City and
exploring the amazing tacos around town.  Hopefully I'll get back to fixing
all reported issues next week.


> Will the approach of using DFLT script work with a multi a multiscript font?
> I.e. if I need to support Arabic (Jawi), Western Cham and Khmer?
> 
> And if I need to support reordering and ligatures DFLT will be ok?
> 
> Are all OT features supported by DFLT or only some?

Let me just explain what HarfBuzz exactly does:

Based on the Unicode script of the text and a set of hardcoded rules, we
choose which shaper to use.  The current rules are in this piece of code:


http://cgit.freedesktop.org/harfbuzz/tree/src/hb-ot-shape-complex-private.hh#n143

To summarize:

  - For Arabic, Mongolian, Syriac, N'ko, 'Phags-pa, and Mandaic, we always use
the Arabic shaper,

  - For Thai and Lao we use the Thai shaper,

  - For Brahmi-derived scripts that have a left-matra kind of character we use
the Indic shaper if and only if a non-DFLT GSUB table is found,

  - For Khmer we use the Indic shaper if and only if the GSUB table has a
'pref' feature,

  - For Myanmar we only use the Indic shaper if the 'mym2' OpenType script is
present, and NOT if it's 'mymr',

  - Otherwise, use the default shaper.


After this, we do our custom normalization (which also handles Hangul Jamo
(de)composition as well as matra decompositions.  Then we map to glyphs, and
apply script-specific OpenType shaping.


Now, here's what each shaper does:


  - Default shaper does this:

    * Enables these features, in both GSUB and GPOS: ccmp, liga, locl, mark,
mkmk, rlig,

    * Depending on the text direction, enable for horizontal: calt, clig,
curs, kern, rclt, and for vertical: valt, vert, vkrn, vpal, vtrt2,

    * For horizontal text, enable either ltra and ltrm, or rtla and rtlm
depending on the direction,

    * Enables a few features based on the script: For Hangul, enable ljmo,
vjmo, and tjmo.  For Tibetan enable abvs, blws, abvm, and blvm,

    * All these features are enabled globally except for rtlm which is only
enabled for RTL runs and characters that do NOT have a Unicode mirroring
character, and applied together.


  - The Thai shaper does two things:

    * Use PUA-encoded shaping (using MS and Mac Unicode encodings) if there is
no GSUB found,

    * Do SARA AM decomposition and reordering,

    * Apply all features from the default shaper.


  - The Arabic shaper does Arabic joining analysis and do:

    * Apply ccmp and locl features together,

    * Apply one of init, medi, fina, iso, med2, fin2, or fin3 based on the
analysis result,

    * Apply rlig, then calt, then the three of cswh, dlig, and mset together,
as well as other features from the default shaper.


  - The Indic shaper does a bunch of things, all based on the MS Indic
OpenType spec, extended to support more scripts.  In particular, it breaks
text into syllables based on a grammar, and for each syllable does:

    * Apply these features globally: locl and ccmp,

    * Do initial-reordering,

    * Apply the following features in this order: nukt, akhn, rphf, rkrf,
pref, half, blwf, abvf, pstf, cfar, cjct, and vatu.  Of which, these ones are
applied globally: nukt, akhn, rkrf, cjct, and vatu, and the rest based on
analysis does on the syllable (and the font tables),

    * Do final-reordering,

    * Apply the following features all at the same time: init, pres, abvs,
blws, psts, haln, dist, abvm, blwm.  Of which, all are applied globally except
for init, as well as other features from default shaper, except for liga which
is turned off.


Indic initial-reordering consists of (omitting lots of details):

  - Find base character.  Different rules are used for different categories of
scripts,

  - Reorder characters based on their desired position in the syllable,

  - Make Reph sequence to be reordered later.


Indic final-reordering consists of (omitting lots of details):

  - Reorder left matra to the desired position,

  - Reorder formed Reph to the desired position after base,

  - Reorder pre-base reordering consonants.



That's about it, at a high level.  So, to answer your question, if you depend
on Indic reordering, you cannot rely on the DFLT script.  If you do NOT depend
on such reordering, it is best to use the DFLT script such that the reordering
rules do NOT interfere with your GSUB rules.  It is fine to mix tables for
multiple scripts all under DFLT as long as the shaper-selection rules above
work for you.

Hope that answers your questions.

Cheers,
behdad



More information about the HarfBuzz mailing list