[HarfBuzz] Contextual shaping of Malayalam post(pre)/below base forms

Tue Jun 18 15:46:23 PDT 2013

On Tue, 18 Jun 2013 19:33:05 +0530
Suresh P <sureshp at gmx.com> wrote:

> Richard Wordingham wrote:

> > The OpenType specification at
> > http://www.microsoft.com/typography/OpenTypeDev/malayalam/intro.htm
> > says:

> > "Reorder pre-base reordering consonants: If a pre-base reordering
> > consonant is found, reorder it according to the following rules:

> > 1.   Only reorder a glyph produced by substitution during
> > application of the <pref> feature. (Note that a font may shape a Ra
> > consonant with the <pref> feature generally but block it in certain
> >       contexts.)
> > ..."

> > This is exactly the logic you want.

> yes

I think the new logic is missing near, in 0.9.18, line 996 of
hb-ot-shape-complex-index.cc, where the code reads:

  if (indic_plan->mask_array[PREF] && base + 2 < end)
  {
    /* Find a Halant,Ra sequence and mark it for pre-base reordering
  processing. */ for (unsigned int i = base + 1; i + 1 < end; i++) {
      hb_codepoint_t glyphs[2] = {info[i].codepoint, info[i +
  1].codepoint}; if (indic_plan->pref.would_substitute (glyphs,
  ARRAY_LENGTH (glyphs), true, face)) {
	info[i++].mask |= indic_plan->mask_array[PREF];
	info[i++].mask |= indic_plan->mask_array[PREF];
...

Using the Meera font (Meera_04.ttf, Revision 4.0, date 12 April 2008),
with substitutions reduced to those for pre-base RA, the code enters
this block when processing the string from <YA, VIRAMA, RA>.
Unfortunately, I stopped tracing the logic in sufficient detail after
this point.  I think the reordering is done before the pref lookup is
actually carried out, and that is why the <YA, VIRAMA, RA> is rendered
as <pre-base RA, YA>. 

I expressed the active parts of the GSUB table to my font compiler as:

GSUB
  script mlm2
    language default ! List of feature entries follows - 1st 4 letters
                     ! are feature tag
        akhn_0 blwf_1 blws_2 half_3 haln_4 pres_5 pstf_6 psts_7 pref_rw
    end language
  end script
  feature pref_rw
      pref_lkp2 ! List of lookups for Malayalam script feature <pref>
  end feature
-- No lookups for other features!  (All commented out - the definitions
-- of features without lookups are not shown in this email.)
  lookup pref_lkp1
      type ligature
      subtable pref_st1
  end lookup
  lookup pref_lkp2
      type chained
      subtable pref_st2
  end lookup
end GSUB

lookup pref_st1
    xx r3 > r4 -- Glyphs identified by postscript names
end lookup     -- xx for VIRAMA, r3 for RA, y1 for YA, and r4 is
               -- pre-base subjoined RA
lookup pref_st2
    | y1 xx r3 | -- No sequence indices for this context!
    | xx r3 |
       0 pref_lkp1
end lookup

I hesitate to try fixing the code myself - checking whether the RA is
replaced, as opposed to whether a substitution occurs, needs good
knowledge of HarfBuzz internals.  Also, function
consonant_position_from_face() in the same file probably needs to be
changed so that the font may cause any consonant to be treated as a
pre-base subjoined form.  It looks as if a former return value of
POS_PRE_C has been optimised away, and restoring it looks like a
fruitful source of new errors.

Who should supply the font for testing?  The test strings
should probably be യ്രക്രഖ്രര്ര and ക്ലഖ്ലയ്ലര്ല .  I haven't looked at
the second problem yet.

Richard.