Thanks Behdad, Very useful and informative. Thanks. Enjoy your tacos. Andrew <div class="gmail_quote">On 29/12/2012 5:41 PM, "Behdad Esfahbod" <<a href="mailto:behdad@behdad.org">behdad@behdad.org</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 12-12-26 08:02 PM, Andrew Cunningham wrote: > Hi Behdad, Hi Andrew, everyone, Sorry for being slow on the list or on fixing bugs. I'm in Mexico City and exploring the amazing tacos around town. Hopefully I'll get back to fixing all reported issues next week. > Will the approach of using DFLT script work with a multi a multiscript font? > I.e. if I need to support Arabic (Jawi), Western Cham and Khmer? > > And if I need to support reordering and ligatures DFLT will be ok? > > Are all OT features supported by DFLT or only some? Let me just explain what HarfBuzz exactly does: Based on the Unicode script of the text and a set of hardcoded rules, we choose which shaper to use. The current rules are in this piece of code: <a href="http://cgit.freedesktop.org/harfbuzz/tree/src/hb-ot-shape-complex-private.hh#n143" target="_blank">http://cgit.freedesktop.org/harfbuzz/tree/src/hb-ot-shape-complex-private.hh#n143</a> To summarize: - For Arabic, Mongolian, Syriac, N'ko, 'Phags-pa, and Mandaic, we always use the Arabic shaper, - For Thai and Lao we use the Thai shaper, - For Brahmi-derived scripts that have a left-matra kind of character we use the Indic shaper if and only if a non-DFLT GSUB table is found, - For Khmer we use the Indic shaper if and only if the GSUB table has a 'pref' feature, - For Myanmar we only use the Indic shaper if the 'mym2' OpenType script is present, and NOT if it's 'mymr', - Otherwise, use the default shaper. After this, we do our custom normalization (which also handles Hangul Jamo (de)composition as well as matra decompositions. Then we map to glyphs, and apply script-specific OpenType shaping. Now, here's what each shaper does: - Default shaper does this: * Enables these features, in both GSUB and GPOS: ccmp, liga, locl, mark, mkmk, rlig, * Depending on the text direction, enable for horizontal: calt, clig, curs, kern, rclt, and for vertical: valt, vert, vkrn, vpal, vtrt2, * For horizontal text, enable either ltra and ltrm, or rtla and rtlm depending on the direction, * Enables a few features based on the script: For Hangul, enable ljmo, vjmo, and tjmo. For Tibetan enable abvs, blws, abvm, and blvm, * All these features are enabled globally except for rtlm which is only enabled for RTL runs and characters that do NOT have a Unicode mirroring character, and applied together. - The Thai shaper does two things: * Use PUA-encoded shaping (using MS and Mac Unicode encodings) if there is no GSUB found, * Do SARA AM decomposition and reordering, * Apply all features from the default shaper. - The Arabic shaper does Arabic joining analysis and do: * Apply ccmp and locl features together, * Apply one of init, medi, fina, iso, med2, fin2, or fin3 based on the analysis result, * Apply rlig, then calt, then the three of cswh, dlig, and mset together, as well as other features from the default shaper. - The Indic shaper does a bunch of things, all based on the MS Indic OpenType spec, extended to support more scripts. In particular, it breaks text into syllables based on a grammar, and for each syllable does: * Apply these features globally: locl and ccmp, * Do initial-reordering, * Apply the following features in this order: nukt, akhn, rphf, rkrf, pref, half, blwf, abvf, pstf, cfar, cjct, and vatu. Of which, these ones are applied globally: nukt, akhn, rkrf, cjct, and vatu, and the rest based on analysis does on the syllable (and the font tables), * Do final-reordering, * Apply the following features all at the same time: init, pres, abvs, blws, psts, haln, dist, abvm, blwm. Of which, all are applied globally except for init, as well as other features from default shaper, except for liga which is turned off. Indic initial-reordering consists of (omitting lots of details): - Find base character. Different rules are used for different categories of scripts, - Reorder characters based on their desired position in the syllable, - Make Reph sequence to be reordered later. Indic final-reordering consists of (omitting lots of details): - Reorder left matra to the desired position, - Reorder formed Reph to the desired position after base, - Reorder pre-base reordering consonants. That's about it, at a high level. So, to answer your question, if you depend on Indic reordering, you cannot rely on the DFLT script. If you do NOT depend on such reordering, it is best to use the DFLT script such that the reordering rules do NOT interfere with your GSUB rules. It is fine to mix tables for multiple scripts all under DFLT as long as the shaper-selection rules above work for you. Hope that answers your questions. Cheers, behdad </blockquote></div>