[HarfBuzz] Contextual shaping of Malayalam post(pre)/below base forms

Wed Jun 26 16:59:05 PDT 2013

On Wed, 26 Jun 2013 19:09:54 -0400
Behdad Esfahbod <behdad at behdad.org> wrote:

> Humm.  Ok, now I know.  In the particular sequence y1 is the base,
> right?  If yes, then it is NOT marked for the pref feature.  <pref>
> is applied to all characters after the base.

Glyph y1 was the base in my test case, but the problem still applies
if I expand the input to the unlikely sequence <PA, VIRAMA, YA,
VIRAMA, RA> (as in the string പ്യ്രക്രഖ്ര ).  The <VIRAMA, RA> still forms the
prebase ra.

>  I believe that's why your original design wasn't working.

The logic to mark for the <pref> feature in 0.9.18 is 

    for (unsigned int i = base + 1; i + 1 < end; i++) {
      hb_codepoint_t glyphs[2] = {info[i].codepoint, info[i +
1].codepoint};
      if (indic_plan->pref.would_substitute (glyphs, ARRAY_LENGTH
      (glyphs), true, face))
      {
        info[i++].mask |= indic_plan->mask_array[PREF];
        info[i++].mask |= indic_plan->mask_array[PREF];
        ...

in function initial_reordering_consonant_syllable() in
hb-ot-shape-complex-indic.cc.

I surmise that the marking is more specific for <pref> because pre-base
rearrangement looks for an isolated glyph marked for <pref>; this
should work, because Uniscribe forbids Malayalam from having two
pre-base ra's in the same syllable.

> Still I'm
> interested to double check that Uniscribe does the same.

Test font is being sent off-list.

> > I haven't yet worked out how HarfBuzz distinguishes a pre-base
> > <VIRAMA,
> > RA> ligature formed by the <pref> feature from a post-base <VIRAMA,
> > RA> RA>
> > ligature formed by the <pstf> feature,

> It doesn't.

One fix would be to remove <pref> marking from consecutive pairs of
glyphs immediately after the <pref> feature has been applied.  I can't
quantify the practical benefit.

Richard.