[HarfBuzz] Contextual shaping of Malayalam post(pre)/below base forms

Wed Jun 19 01:39:17 PDT 2013

Richard Wordingham wrote:
> On Tue, 18 Jun 2013 19:33:05 +0530
> Suresh P <sureshp at gmx.com> wrote:
>
>> Richard Wordingham wrote:
>>> The OpenType specification at
>>> http://www.microsoft.com/typography/OpenTypeDev/malayalam/intro.htm
>>> says:
>>> "Reorder pre-base reordering consonants: If a pre-base reordering
>>> consonant is found, reorder it according to the following rules:
>>> 1.   Only reorder a glyph produced by substitution during
>>> application of the <pref> feature. (Note that a font may shape a Ra
>>> consonant with the <pref> feature generally but block it in certain
>>>        contexts.)
>>> ..."
>>> This is exactly the logic you want.
>> yes
> I think the new logic is missing near, in 0.9.18, line 996 of
> hb-ot-shape-complex-index.cc, where the code reads:
>
>    if (indic_plan->mask_array[PREF] && base + 2 < end)
>    {
>      /* Find a Halant,Ra sequence and mark it for pre-base reordering
>    processing. */ for (unsigned int i = base + 1; i + 1 < end; i++) {
>        hb_codepoint_t glyphs[2] = {info[i].codepoint, info[i +
>    1].codepoint}; if (indic_plan->pref.would_substitute (glyphs,
>    ARRAY_LENGTH (glyphs), true, face)) {
> 	info[i++].mask |= indic_plan->mask_array[PREF];
> 	info[i++].mask |= indic_plan->mask_array[PREF];
> ...
>
> Using the Meera font (Meera_04.ttf, Revision 4.0, date 12 April 2008),
> with substitutions reduced to those for pre-base RA, the code enters
> this block when processing the string from <YA, VIRAMA, RA>.
> Unfortunately, I stopped tracing the logic in sufficient detail after
> this point.  I think the reordering is done before the pref lookup is
> actually carried out, and that is why the <YA, VIRAMA, RA> is rendered
> as <pre-base RA, YA>.
>
> I expressed the active parts of the GSUB table to my font compiler as:
>
> GSUB
>    script mlm2
>      language default ! List of feature entries follows - 1st 4 letters
>                       ! are feature tag
>          akhn_0 blwf_1 blws_2 half_3 haln_4 pres_5 pstf_6 psts_7 pref_rw
>      end language
>    end script
>    feature pref_rw
>        pref_lkp2 ! List of lookups for Malayalam script feature <pref>
>    end feature
> -- No lookups for other features!  (All commented out - the definitions
> -- of features without lookups are not shown in this email.)
>    lookup pref_lkp1
>        type ligature
>        subtable pref_st1
>    end lookup
>    lookup pref_lkp2
>        type chained
>        subtable pref_st2
>    end lookup
> end GSUB
>
> lookup pref_st1
>      xx r3 > r4 -- Glyphs identified by postscript names
> end lookup     -- xx for VIRAMA, r3 for RA, y1 for YA, and r4 is
>                 -- pre-base subjoined RA
> lookup pref_st2
>      | y1 xx r3 | -- No sequence indices for this context!
>      | xx r3 |
>         0 pref_lkp1
> end lookup
>
> I hesitate to try fixing the code myself - checking whether the RA is
> replaced, as opposed to whether a substitution occurs, needs good
> knowledge of HarfBuzz internals.  Also, function
> consonant_position_from_face() in the same file probably needs to be
> changed so that the font may cause any consonant to be treated as a
> pre-base subjoined form.  It looks as if a former return value of
> POS_PRE_C has been optimised away, and restoring it looks like a
> fruitful source of new errors.
>
> Who should supply the font for testing?  The test strings
> should probably be യ്രക്രഖ്രര്ര and ക്ലഖ്ലയ്ലര്ല .  I haven't looked at
> the second problem yet.
I have found that the <YA VIRAMA LA> case can be handled with 
decomposition rule using ccmp for the given context. But that looks as a 
work around to me. The <YA VIRAMA RA> is more complex because it 
involves ligation and reordering.
>
> Richard.
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz