[HarfBuzz] Unicode ignorables treatment
Nikolay Sivov
bunglehead at gmail.com
Tue Jul 28 02:46:20 PDT 2015
Hello,
I had a brief chat with Behdad on irc the other day, and he suggested to
move it here.
I'm currently considering using hb as a shaper in Wine's DirectWrite
replacement; I put together a simple prototype that uses hafbuzz with
freetype faces instances to get resulting glyph indices, so far it works
fine on Persian sample I'm trying with. However my goal is to match
DirectWrite behavior as close as possible, so it came to running some
existing tests we got, and first thing that popped up was control codes
treatment, specifically LRE/PDF sequence. According to tests that we run
on all supported Windows versions (for that particular case in means
Vista+) shaping method returns 0 index (.notdef glyph) for control code
positions, and yes, placement method that comes after return non-zero
advances for those, as in many fonts notdefs are represented as empty
boxes visually.
If you're unfamiliar with that part of DirectWrite API, basically what
happens is that you call "IDWriteTextAnalyzer::GetGlyphs" method,
passing text, font instance, script id to it, in returns it gives you
glyph indices, separately fills glyph properties and properties for
initial text codepoints, it also fills clustermap array. Current
relevant test code is here [1]. As you can see call succeeds, and index
is 0 in both cases, with isZeroWidthSpace flag NOT set.
Later on TextLayout API level such clusters are indeed treated
differently - they have 0 cluster width, but again it's another story,
and low level shaping API is exposed to applications.
With hb when no script was set explicitly to a buffer I get 0xfffe as
glyph 'codepoint' value, which is fine, as I can filter it out easily.
If I set some script id, like Arabic in my test case those codepoints
will be replaced with space glyph of zero advance. So the question is
how can I reliably detect those so I can fixup returned glyph indices in
a way I'd like? Does it sound reasonable to track back to text point
using 'cluster' value and check for it explicitly? It's a bit ugly, but
I'm prepared to do that, if it will always work.
P.S. for the ones interested, dwrite shaping API is different in some
other aspects too, namely:
- cluster mapping, dwrite maps text point to glyph index, while hb maps
in opposite direction (I simply remapped it manually, not sure if
there's an api for that);
- directionality handling, hb returns glyphs in order they appear
regardless of direction, so 0 glyph is leftmost, no matter if it's
logically first or last in a run; dwrite always keeps logical order AND
additional flag so you can tell if run is RTL (I used
hb_buffer_reverse() in RTL case for that);
Nikolay
[1]
http://source.winehq.org/git/wine.git/blob/HEAD:/dlls/dwrite/tests/analyzer.c#l1278
More information about the HarfBuzz
mailing list