[HarfBuzz] Contextual shaping of Malayalam post(pre)/below base forms

Mon Jul 1 23:56:18 PDT 2013

Behdad Esfahbod wrote:
> On 13-06-26 07:59 PM, Richard Wordingham wrote:
>> On Wed, 26 Jun 2013 19:09:54 -0400
>> Behdad Esfahbod <behdad at behdad.org> wrote:
>>
>>> Humm.  Ok, now I know.  In the particular sequence y1 is the base,
>>> right?  If yes, then it is NOT marked for the pref feature.  <pref>
>>> is applied to all characters after the base.
>> Glyph y1 was the base in my test case, but the problem still applies
>> if I expand the input to the unlikely sequence <PA, VIRAMA, YA,
>> VIRAMA, RA> (as in the string പ്യ്രക്രഖ്ര ).  The <VIRAMA, RA> still forms the
>> prebase ra.
>>
>>>   I believe that's why your original design wasn't working.
>> The logic to mark for the <pref> feature in 0.9.18 is
>>
>>      for (unsigned int i = base + 1; i + 1 < end; i++) {
>>        hb_codepoint_t glyphs[2] = {info[i].codepoint, info[i +
>> 1].codepoint};
>>        if (indic_plan->pref.would_substitute (glyphs, ARRAY_LENGTH
>>        (glyphs), true, face))
>>        {
>>          info[i++].mask |= indic_plan->mask_array[PREF];
>>          info[i++].mask |= indic_plan->mask_array[PREF];
>>          ...
>>
>> in function initial_reordering_consonant_syllable() in
>> hb-ot-shape-complex-indic.cc.
>>
>> I surmise that the marking is more specific for <pref> because pre-base
>> rearrangement looks for an isolated glyph marked for <pref>; this
>> should work, because Uniscribe forbids Malayalam from having two
>> pre-base ra's in the same syllable.
> Right.  That's why, indeed.
>
>
>>> Still I'm
>>> interested to double check that Uniscribe does the same.
>> Test font is being sent off-list.
>>
>>>> I haven't yet worked out how HarfBuzz distinguishes a pre-base
>>>> <VIRAMA,
>>>> RA> ligature formed by the <pref> feature from a post-base <VIRAMA,
>>>> RA> RA>
>>>> ligature formed by the <pstf> feature,
>>> It doesn't.
>> One fix would be to remove <pref> marking from consecutive pairs of
>> glyphs immediately after the <pref> feature has been applied.  I can't
>> quantify the practical benefit.
> Yeah, lets wait until someone needs that and then try to figure out what
> Uniscribe does.
>
I have now noticed that the positioning of prebase vowel marks go awry 
after applying the contextual rules for both

<YA VIRAMA RA> and <RA VIRAMA RA>. A test case is യ്ലോ(<YA> <VIRAMA> <LA> <VOWEL SIGN OO> as in കൊയ്ലോ.It is now positioned in front of YA. Maybe because the shaper still thinks YA is the base. In fact LA/RA should be the new base if the y1 | xx r3 | rule is passed.