[HarfBuzz] Mark zeroing for East Asian scripts

Behdad Esfahbod behdad at behdad.org
Tue Jan 27 14:06:16 PST 2015


On 15-01-27 02:26 AM, Jonathan Kew wrote:
> On 26/1/15 22:53, Behdad Esfahbod wrote:
>> Jonathan,
>>
>> Trying with sequence of U+308F,3099,308F with NotoSansJP, looks like Uniscribe
>> doesn't zero the mark advance but we do.  The font has bad data for this mark,
>> but I want to fix the logic discrepancy.
>>
>> So I'll probably add a new (null-ish) shaper for HIRAGANA, KATAKANA, and HAN,
>> and use HB_OT_SHAPE_ZERO_WIDTH_MARKS_NONE.
>>
>> WDYT?  What to call that shaper?
> 
> Seems reasonable to me. In practice, it looks like when fonts support the
> combining versions of the sound marks at all (some don't include them), they
> do it by ligating them with the preceding kana character, rather than just
> positioning a mark on the standard glyph (which would often clash, or at least
> look excessively crowded).
> 
> As such, zeroing the "mark" advance for a font that didn't do the ligation is
> not likely to be useful. So if Uniscribe doesn't do it (and neither does
> Cocoa/Core Text, afaict), let's just follow their lead.

Ok, CoreText does actually do something.  I think it's using heuristics based
on the mark outlines... Whereas with ot we currently get:

$ hb-unicode-encode 308F 3099 308F | hb-shape NotoSansJP-Regular.otf
[gid1275=0+1000|gid1283=0+0|gid1275=2+1000]

with Uniscribe:

$ hb-unicode-encode 308F 3099 308F | hb-shape NotoSansJP-Regular.otf --shaper
uniscribe
[gid1275=0+1000|gid1283=0+1000|gid1275=2+1000]

and CoreText:

$ hb-unicode-encode 308F 3099 308F | hb-shape NotoSansJP-Regular.otf --shaper
coretext
[gid1275=0+1000|gid1283=0 at 0,-34+367|gid1275=2+1000]


So, we have to choose to match Uniscribe (my current preference), or implement
heuristics (based on glyph extents?).

> Call it hb_ot_complex_shaper_eastasian? hb_ot_complex_shaper_han_kana?
> 
> JK
> 
> 

-- 
behdad
http://behdad.org/


More information about the HarfBuzz mailing list