[HarfBuzz] Normalization-aware font fallback (was Re: HarfBuzz 1.0 API; the message you were hoping would never come

Behdad Esfahbod behdad at behdad.org
Wed Aug 6 14:14:47 PDT 2014


On 14-01-03 05:07 AM, Jonathan Kew wrote:
>>
>> This makes me realize that I don't understand the big picture of how
>> this fallback process interacts with harfbuzz. In order to do fallback,
>> you need to do character to glyph mapping.
> 
> Not necessarily. You need to know the character repertoire supported by the
> font, but you may not need to actually map to glyphs. In Firefox, for
> instance, font fallback is done based on a per-font *bit* map of supported
> Unicode codepoints. So at the font fallback stage, we know whether the
> character is present, but do not map it to a glyph.

When we had this discussion back in January I started putting a hack together,
I just got to get it working.  I've pushed it in the hb-fc branch of my github
repo:

  https://github.com/behdad/harfbuzz/commits/hb-fc

What it does is to introduce a (not public yet) hb-fc.h header:

  https://github.com/behdad/harfbuzz/blob/hb-fc/util/hb-fc.h

And a cmdline tool called hb-fc-list:

  https://github.com/behdad/harfbuzz/blob/hb-fc/util/hb-fc-list.c

What hb-fc-list does is that it lists (ala fc-list) all fonts that can render
a given string using hb_shape().  Ie. it takes HarfBuzz's normalization
process into account.

I haven't tested it for tricky cases.  The source code itself is the best
documentation at this point:

  https://github.com/behdad/harfbuzz/blob/hb-fc/util/hb-fc.cc

(Just filed this bug re variation-selectors support in fontconfig:
 https://bugs.freedesktop.org/show_bug.cgi?id=82266 )

Here's a run:

behdad:util 0$ time fc-list | wc -l
562

real	0m0.022s
user	0m0.014s
sys	0m0.008s
behdad:util 0$ time ./hb-fc-list حرف‌باز | wc -l
59

real	0m0.043s
user	0m0.030s
sys	0m0.017s

Note that there's a ZWNJ in that string.  If I just query fc-list for fonts
that cover all the characters in that string, it doesn't list fonts that don't
map ZWNJ, even though they are perfectly fine for shaping:

0$ time fc-list :charset=062D,0631,0641,200C,0628,0627,0632 | wc -l
39

real	0m0.021s
user	0m0.010s
sys	0m0.008s

Thoughts?
-- 
behdad
http://behdad.org/


More information about the HarfBuzz mailing list