[HarfBuzz] Documenting OpenType shaping

Richard Wordingham richard.wordingham at ntlworld.com
Sat Jun 16 23:50:06 UTC 2018


On Fri, 15 Jun 2018 17:53:41 -0500
Nathan Willis <nwillis at glyphography.com> wrote:

> On Wed, Jun 6, 2018 at 2:29 PM, Richard Wordingham <
> richard.wordingham at ntlworld.com> wrote:  
> 
> > On Tue, 5 Jun 2018 09:42:38 -0500
> > Nathan Willis <nwillis at glyphography.com> wrote:
> >  
> > > Your feedback and help is appreciated!  
> >
> > * Malayalam Remarks *
> >
> > In Sections 2.2 and 2.3, how are multiple vowels handled, such as
> > U+0D4A and U+0D4B?  I'm particularly interested in the handling of
> > multiple left matras.
> >  
> 
> Hmm. So, as I understand it, in HarfBuzz the presence of multiple
> matras (on any side) would be an issue dealt with by the
> syllable-identification regular expressions, before getting to the
> reordering stuff.
> 
> It seems like this it what is used (the same regexps being used for
> all scripts in HarfBuzz's Indic shaper):
> 
> matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
> [...]
> halant_or_matra_group = (final_halant_group | (H.ZWJ)?
> matra_group{0,4});
> 
> ... and that only permits four matras (total) per syllable.
> 
> I vaguely recall seeing a commit message or comment or something
> indicating that this limit was there to maintain compatibility with
> how Uniscribe matches syllables, but I searched around and couldn't
> find it today. It was something along the lines of the Microsoft docs
> saying "one matra for each type [L,R,T,B] is permitted," but that
> isn't clear whether it's justified by orthography at all or is just a
> practical concession that they made for some reason.

It looks more like a desire to prohibit as many unusual combinations as
they can.

> Others with more Uniscribe knowledge may know.
> 
> That having been said, I *think* that HarfBuzz doesn't rearrange two
> adjacent codepoints that have the same sort-ordering tags. So
> "Consonant,U+0D4A,U+0D4B" ought to get the matras decomposed, then
> the two left-side parts move together as-is to the left of the
> consonant, and the two right-side parts remain unchanged.
> 
> You could test that with
> hb-view /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf
> --unicodes=0d15,0d4a,0d4b

A more revealing test case is

hb-view /usr/share/fonts/truetype/ttf-indic-fonts-core/Meera_04.ttf
--unicodes=0d15,0d4b,0d4c

Remembering that U+0D4B decomposes to <U+0D47 SIGN EE, U+0D3E SIGN AA>
and U+0D4C decomposes to <U+0D46 SIGN E, U+0D57 AU LENGTH MARK>, it
yields the bizarre sequence <g0D47, g0D46, gKA, g0D3E, g0D57>, in
complete violation of the inside-out rule for combining marks in white
man's scripts.  This behaviour should be documented in some fashion.

The behaviour of USE is worthy of comparison.  The sequence <U+1A20 TAI
THAM LETTER HIGH KA, U+1A70 TAI THAM VOWEL SIGN OO, U+1A6E TAI THAM
VOWEL SIGN E, U+1A63 TAI THAM VOWEL SIGN AA>, which is at best a
lexicographer's convention, is rendered <gOO, gE, gKA, gAA> in MS Edge
but as <gE, gOO, gKA, gAA> by HarfBuzz, which in this case observes the
inside-out rule.

> In Section 3, how does tagging interact with substitutions?
> > Features can in general split and merge glyphs.
> >
> >  
> The tagging described in stage 2 is just the reordering /
> syllable-position tags. So after all that is done, the
> sort-the-syllable-into-final-sort-order is (AIUI) the last that the
> tags come into play.


> I do know that HarfBuzz keeps track of other sorts of state that it
> may refer to internally as tags, but I don't think any of these docs
> reference those, just the reordering position tags.
> 
> So applying the features in stage 3 doesn't interact with the tags —
> at least, not directly. If the tagging was wrong, of course, then the
> final sorted order might be wrong and sequences wouldn't match up to
> the substitution rules in GSUB.  But, if I follow HarfBuzz's logic
> right, the reordering stuff cannot be switched off, so it always
> happens completely before any substitutions start, and that seems to
> be what other shapers did first.
> 
> Should there be a wording change to address that in the document
> itself?

In the Indian Indic scripts, there are reportedly three steps:

Initial reordering
'Mandatory' substitutions
Final reordering

Are you saying that the 'final reordering' is a null-op?  This cannot
be the case.  The Rendering of <U+0926 DA, U+094D VIRAMA, U+0926 DA,
U+093F SIGN I> depends on whether there is a conjunct D.DA in the
font.  Assuming DA has no formal half-form, there are two possible
normal renderings: <gDA, gVIRAMA, gI, gDA> and <gI, gD.DA> 

What gets moved when?  You say the initial reordering 'may mean moving
dependent-vowel (matra) glyphs', and then say, 'The final reordering
stage repositions marks, dependent-vowel (matra) signs, and "Reph"
glyphs to the appropriate location with respect to the base consonant'.

In the USE, there is no initial reordering.  In my code-revealing font,
Dalekh Si, which is designed for use with a spell-checker (and works
well in Firefox), I split preposed vowels into a part that moves and an
ink-free part that stays put.  I use the ink-free part to colour
consonants that follow vowels within the akshara.  Now this works in MS
Edge and Firefox, but I don't know whether I'm just lucky.

Richard.


More information about the HarfBuzz mailing list