[HarfBuzz] Unicode ignorables treatment

Nikolay Sivov bunglehead at gmail.com
Tue Jul 28 02:46:20 PDT 2015


I had a brief chat with Behdad on irc the other day, and he suggested to 
move it here.

I'm currently considering using hb as a shaper in Wine's DirectWrite 
replacement; I put together a simple prototype that uses hafbuzz with 
freetype faces instances to get resulting glyph indices, so far it works 
fine on Persian sample I'm trying with. However my goal is to match 
DirectWrite behavior as close as possible, so it came to running some 
existing tests we got, and first thing that popped up was control codes 
treatment, specifically LRE/PDF sequence. According to tests that we run 
on all supported Windows versions (for that particular case in means 
Vista+) shaping method returns 0 index (.notdef glyph) for control code 
positions, and yes, placement method that comes after return non-zero 
advances for those, as in many fonts notdefs are represented as empty 
boxes visually.

If you're unfamiliar with that part of DirectWrite API, basically what 
happens is that you call "IDWriteTextAnalyzer::GetGlyphs" method, 
passing text, font instance, script id to it, in returns it gives you 
glyph indices, separately fills glyph properties and properties for 
initial text codepoints, it also fills clustermap array. Current 
relevant test code is here [1]. As you can see call succeeds, and index 
is 0 in both cases, with isZeroWidthSpace flag NOT set.

Later on TextLayout API level such clusters are indeed treated 
differently - they have 0 cluster width, but again it's another story, 
and low level shaping API is exposed to applications.

With hb when no script was set explicitly to a buffer I get 0xfffe as 
glyph 'codepoint' value, which is fine, as I can filter it out easily. 
If I set some script id, like Arabic in my test case those codepoints 
will be replaced with space glyph of zero advance. So the question is 
how can I reliably detect those so I can fixup returned glyph indices in 
a way I'd like? Does it sound reasonable to track back to text point 
using 'cluster' value and check for it explicitly? It's a bit ugly, but 
I'm prepared to do that, if it will always work.

P.S. for the ones interested, dwrite shaping API is different in some 
other aspects too, namely:

- cluster mapping, dwrite maps text point to glyph index, while hb maps 
in opposite direction (I simply remapped it manually, not sure if 
there's an api for that);
- directionality handling, hb returns glyphs in order they appear 
regardless of direction, so 0 glyph is leftmost, no matter if it's 
logically first or last in a run; dwrite always keeps logical order AND 
additional flag so you can tell if run is RTL (I used 
hb_buffer_reverse() in RTL case for that);



More information about the HarfBuzz mailing list