[HarfBuzz] Unicode ignorables treatment

Adam Twardoch (List) list.adam at twardoch.com
Tue Jul 28 03:25:45 PDT 2015


This is great news! I cannot answer your question but it's great that HB is moving to become the Wine DW shaper. Do you also plan to use it with the Wine Uniscribe.dll replacement? 

Adam

Sent from my mobile phone.

> On 28.07.2015, at 11:46, Nikolay Sivov <bunglehead at gmail.com> wrote:
> 
> Hello,
> 
> I had a brief chat with Behdad on irc the other day, and he suggested to move it here.
> 
> I'm currently considering using hb as a shaper in Wine's DirectWrite replacement; I put together a simple prototype that uses hafbuzz with freetype faces instances to get resulting glyph indices, so far it works fine on Persian sample I'm trying with. However my goal is to match DirectWrite behavior as close as possible, so it came to running some existing tests we got, and first thing that popped up was control codes treatment, specifically LRE/PDF sequence. According to tests that we run on all supported Windows versions (for that particular case in means Vista+) shaping method returns 0 index (.notdef glyph) for control code positions, and yes, placement method that comes after return non-zero advances for those, as in many fonts notdefs are represented as empty boxes visually.
> 
> If you're unfamiliar with that part of DirectWrite API, basically what happens is that you call "IDWriteTextAnalyzer::GetGlyphs" method, passing text, font instance, script id to it, in returns it gives you glyph indices, separately fills glyph properties and properties for initial text codepoints, it also fills clustermap array. Current relevant test code is here [1]. As you can see call succeeds, and index is 0 in both cases, with isZeroWidthSpace flag NOT set.
> 
> Later on TextLayout API level such clusters are indeed treated differently - they have 0 cluster width, but again it's another story, and low level shaping API is exposed to applications.
> 
> With hb when no script was set explicitly to a buffer I get 0xfffe as glyph 'codepoint' value, which is fine, as I can filter it out easily. If I set some script id, like Arabic in my test case those codepoints will be replaced with space glyph of zero advance. So the question is how can I reliably detect those so I can fixup returned glyph indices in a way I'd like? Does it sound reasonable to track back to text point using 'cluster' value and check for it explicitly? It's a bit ugly, but I'm prepared to do that, if it will always work.
> 
> P.S. for the ones interested, dwrite shaping API is different in some other aspects too, namely:
> 
> - cluster mapping, dwrite maps text point to glyph index, while hb maps in opposite direction (I simply remapped it manually, not sure if there's an api for that);
> - directionality handling, hb returns glyphs in order they appear regardless of direction, so 0 glyph is leftmost, no matter if it's logically first or last in a run; dwrite always keeps logical order AND additional flag so you can tell if run is RTL (I used hb_buffer_reverse() in RTL case for that);
> 
> Nikolay
> 
> 
> [1] http://source.winehq.org/git/wine.git/blob/HEAD:/dlls/dwrite/tests/analyzer.c#l1278
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz


More information about the HarfBuzz mailing list