[HarfBuzz] Unicode ignorables treatment

Nikolay Sivov bunglehead at gmail.com
Tue Jul 28 03:50:41 PDT 2015

On 28.07.2015 13:25, Adam Twardoch (List) wrote:
> This is great news! I cannot answer your question but it's great that HB is moving to become the Wine DW shaper. Do you also plan to use it with the Wine Uniscribe.dll replacement?

That's a good question, it's not an immediate goal, and I'm not sure if 
there's an easy way to do it, that will be compatible with what 
application expect, let's say DW is more Unicode-friendly. Another thing 
is that Uniscribe is meant to be used in conjunction with GDI only, and 
that's what we do - all font access goes through public GDI API, and I'm 
not sure how thin will a HB font wrapper be in this case.

Even with DW my idea so far is to use only glyph data from HB, and 
implement GPOS part manually. There's at least a couple of reasons for that:

- DW API is split into two methods, GetGlyphs() and 
GetGlyphPlacements()/GetGdiCompatibleGlyphPlacements(), first one is 
used to get glyph indices/cluster info, second ones to get advances and 
offsets; there's no API way to pass any context between those two, which 
means I'll have to either invent a shared cache so I don't have to 
hb_shape() twice, or reimplement second part myself;

- DW supports several measurement modes, roughly hinted/not-hinted to 
keep it simple, all positions are floating point, either rounded to 
pixel or not, so I'll need to account for that too, maybe not in a first 
iteration, but still it's important to keep in mind.

> Adam
> Sent from my mobile phone.
>> On 28.07.2015, at 11:46, Nikolay Sivov <bunglehead at gmail.com> wrote:
>> Hello,
>> I had a brief chat with Behdad on irc the other day, and he suggested to move it here.
>> I'm currently considering using hb as a shaper in Wine's DirectWrite replacement; I put together a simple prototype that uses hafbuzz with freetype faces instances to get resulting glyph indices, so far it works fine on Persian sample I'm trying with. However my goal is to match DirectWrite behavior as close as possible, so it came to running some existing tests we got, and first thing that popped up was control codes treatment, specifically LRE/PDF sequence. According to tests that we run on all supported Windows versions (for that particular case in means Vista+) shaping method returns 0 index (.notdef glyph) for control code positions, and yes, placement method that comes after return non-zero advances for those, as in many fonts notdefs are represented as empty boxes visually.
>> If you're unfamiliar with that part of DirectWrite API, basically what happens is that you call "IDWriteTextAnalyzer::GetGlyphs" method, passing text, font instance, script id to it, in returns it gives you glyph indices, separately fills glyph properties and properties for initial text codepoints, it also fills clustermap array. Current relevant test code is here [1]. As you can see call succeeds, and index is 0 in both cases, with isZeroWidthSpace flag NOT set.
>> Later on TextLayout API level such clusters are indeed treated differently - they have 0 cluster width, but again it's another story, and low level shaping API is exposed to applications.
>> With hb when no script was set explicitly to a buffer I get 0xfffe as glyph 'codepoint' value, which is fine, as I can filter it out easily. If I set some script id, like Arabic in my test case those codepoints will be replaced with space glyph of zero advance. So the question is how can I reliably detect those so I can fixup returned glyph indices in a way I'd like? Does it sound reasonable to track back to text point using 'cluster' value and check for it explicitly? It's a bit ugly, but I'm prepared to do that, if it will always work.
>> P.S. for the ones interested, dwrite shaping API is different in some other aspects too, namely:
>> - cluster mapping, dwrite maps text point to glyph index, while hb maps in opposite direction (I simply remapped it manually, not sure if there's an api for that);
>> - directionality handling, hb returns glyphs in order they appear regardless of direction, so 0 glyph is leftmost, no matter if it's logically first or last in a run; dwrite always keeps logical order AND additional flag so you can tell if run is RTL (I used hb_buffer_reverse() in RTL case for that);
>> Nikolay
>> [1] http://source.winehq.org/git/wine.git/blob/HEAD:/dlls/dwrite/tests/analyzer.c#l1278
>> _______________________________________________
>> HarfBuzz mailing list
>> HarfBuzz at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/harfbuzz

More information about the HarfBuzz mailing list