[HarfBuzz] How various HarfBuzz OpenType shapers word (was Re: Tai Tham Shaping Question #2 : MEDIAL RA

Andrew Cunningham lang.support at gmail.com
Sat Dec 29 01:21:43 PST 2012


Thanks Behdad,

Very useful and informative. Thanks.

Enjoy your tacos.

Andrew
On 29/12/2012 5:41 PM, "Behdad Esfahbod" <behdad at behdad.org> wrote:

> On 12-12-26 08:02 PM, Andrew Cunningham wrote:
> > Hi Behdad,
>
> Hi Andrew, everyone,
>
> Sorry for being slow on the list or on fixing bugs.  I'm in Mexico City and
> exploring the amazing tacos around town.  Hopefully I'll get back to fixing
> all reported issues next week.
>
>
> > Will the approach of using DFLT script work with a multi a multiscript
> font?
> > I.e. if I need to support Arabic (Jawi), Western Cham and Khmer?
> >
> > And if I need to support reordering and ligatures DFLT will be ok?
> >
> > Are all OT features supported by DFLT or only some?
>
> Let me just explain what HarfBuzz exactly does:
>
> Based on the Unicode script of the text and a set of hardcoded rules, we
> choose which shaper to use.  The current rules are in this piece of code:
>
>
>
> http://cgit.freedesktop.org/harfbuzz/tree/src/hb-ot-shape-complex-private.hh#n143
>
> To summarize:
>
>   - For Arabic, Mongolian, Syriac, N'ko, 'Phags-pa, and Mandaic, we always
> use
> the Arabic shaper,
>
>   - For Thai and Lao we use the Thai shaper,
>
>   - For Brahmi-derived scripts that have a left-matra kind of character we
> use
> the Indic shaper if and only if a non-DFLT GSUB table is found,
>
>   - For Khmer we use the Indic shaper if and only if the GSUB table has a
> 'pref' feature,
>
>   - For Myanmar we only use the Indic shaper if the 'mym2' OpenType script
> is
> present, and NOT if it's 'mymr',
>
>   - Otherwise, use the default shaper.
>
>
> After this, we do our custom normalization (which also handles Hangul Jamo
> (de)composition as well as matra decompositions.  Then we map to glyphs,
> and
> apply script-specific OpenType shaping.
>
>
> Now, here's what each shaper does:
>
>
>   - Default shaper does this:
>
>     * Enables these features, in both GSUB and GPOS: ccmp, liga, locl,
> mark,
> mkmk, rlig,
>
>     * Depending on the text direction, enable for horizontal: calt, clig,
> curs, kern, rclt, and for vertical: valt, vert, vkrn, vpal, vtrt2,
>
>     * For horizontal text, enable either ltra and ltrm, or rtla and rtlm
> depending on the direction,
>
>     * Enables a few features based on the script: For Hangul, enable ljmo,
> vjmo, and tjmo.  For Tibetan enable abvs, blws, abvm, and blvm,
>
>     * All these features are enabled globally except for rtlm which is only
> enabled for RTL runs and characters that do NOT have a Unicode mirroring
> character, and applied together.
>
>
>   - The Thai shaper does two things:
>
>     * Use PUA-encoded shaping (using MS and Mac Unicode encodings) if
> there is
> no GSUB found,
>
>     * Do SARA AM decomposition and reordering,
>
>     * Apply all features from the default shaper.
>
>
>   - The Arabic shaper does Arabic joining analysis and do:
>
>     * Apply ccmp and locl features together,
>
>     * Apply one of init, medi, fina, iso, med2, fin2, or fin3 based on the
> analysis result,
>
>     * Apply rlig, then calt, then the three of cswh, dlig, and mset
> together,
> as well as other features from the default shaper.
>
>
>   - The Indic shaper does a bunch of things, all based on the MS Indic
> OpenType spec, extended to support more scripts.  In particular, it breaks
> text into syllables based on a grammar, and for each syllable does:
>
>     * Apply these features globally: locl and ccmp,
>
>     * Do initial-reordering,
>
>     * Apply the following features in this order: nukt, akhn, rphf, rkrf,
> pref, half, blwf, abvf, pstf, cfar, cjct, and vatu.  Of which, these ones
> are
> applied globally: nukt, akhn, rkrf, cjct, and vatu, and the rest based on
> analysis does on the syllable (and the font tables),
>
>     * Do final-reordering,
>
>     * Apply the following features all at the same time: init, pres, abvs,
> blws, psts, haln, dist, abvm, blwm.  Of which, all are applied globally
> except
> for init, as well as other features from default shaper, except for liga
> which
> is turned off.
>
>
> Indic initial-reordering consists of (omitting lots of details):
>
>   - Find base character.  Different rules are used for different
> categories of
> scripts,
>
>   - Reorder characters based on their desired position in the syllable,
>
>   - Make Reph sequence to be reordered later.
>
>
> Indic final-reordering consists of (omitting lots of details):
>
>   - Reorder left matra to the desired position,
>
>   - Reorder formed Reph to the desired position after base,
>
>   - Reorder pre-base reordering consonants.
>
>
>
> That's about it, at a high level.  So, to answer your question, if you
> depend
> on Indic reordering, you cannot rely on the DFLT script.  If you do NOT
> depend
> on such reordering, it is best to use the DFLT script such that the
> reordering
> rules do NOT interfere with your GSUB rules.  It is fine to mix tables for
> multiple scripts all under DFLT as long as the shaper-selection rules above
> work for you.
>
> Hope that answers your questions.
>
> Cheers,
> behdad
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20121229/78fdbf13/attachment.html>


More information about the HarfBuzz mailing list