[HarfBuzz] 'mark' feature in myanmar

Jonathan Kew jfkthame at googlemail.com
Thu Jul 18 12:15:19 PDT 2013


On 18/7/13 19:21, Behdad Esfahbod wrote:
> On 13-07-18 02:19 PM, Jonathan Kew wrote:
>> Hey Behdad,
>>
>> In hb-ot-shape-complex-myanmar.cc, we have the code fragment
>>
>>    /*
>>     * Note:
>>     *
>>     * Spec says 'mark' is used, and the mmrtext.ttf font from
>>     * Windows 8 has lookups for it.  But testing suggests that
>>     * Windows 8 Uniscribe is NOT applying it.  It *is* applying
>>     * 'mkmk' however.
>>     */
>>    if (hb_options ().uniscribe_bug_compatible)
>>      plan->map.add_feature (HB_TAG('m','a','r','k'), 0, F_GLOBAL);
>>
>> which disables the 'mark' feature in Uniscribe-compatible mode.
>>
>> However, AFAICT from my current testing, it looks like Uniscribe on Win8 *is*
>> applying the 'mark' feature, and so this is resulting in tons of unwanted
>> discrepancies when I try to compare stuff. (This is when testing with the
>> mmrtext.ttf font from Windows 8.)
>
> Interesting.  IIRC we were getting 100% parity in February with that same
> font, right?

According to the comment for 1c8654ead41ca746d577549c92d2a41c594ab639, 
we were seeing 15 differences on the wikipedia corpus, although we 
believed they were cases where Uniscribe was wrong.

Currently, I'm seeing rather more than that (even after accounting for 
'mark'), so I still need to investigate further... maybe I'm doing 
something else wrong.

> Got a sample sequence with expected / current
> output?

A simple sequence such as

   <U+101B,U+1000,U+103A>

is currently giving me

   [gid262=0+1304|gid235=1+2186|gid370=1 at 9,0+0]

with the uniscribe backend on Win8.

The position adjustment of the last glyph is the result of the 'mark' 
feature. But in "uniscribe-bug-compatible" mode, harfbuzz doesn't apply 
'mark', and so it returns

   [gid262=0+1304|gid235=1+2186|gid370=1+0]

instead.





More information about the HarfBuzz mailing list