[HarfBuzz] A couple of clarifications regarding HarfBuzz

Behdad Esfahbod behdad at behdad.org
Thu Oct 21 11:55:51 PDT 2010

On 10/21/10 04:10, Tom Hacohen wrote:

>> Language is used to do language-specific adjustments when appropriate.  You
>> typically just pass the locale or whatever your higher-level tells you (think
>> of lang attribute in html) to hb_language_from_string.
> As I thought, thanks, I wasn't thinking about languages using the same
> script like many of the latin languages and their ligatures.

It's more than just Latin.

>> HarfBuzz does the right thing no matter what you pass in.   So you can safely
>> pass 0.  String length in characters would be most appropriate if you have it.
> I assumed HarfBuzz does well anyway, but I want the fastest way
> possible. Ok then, I have the string's length (as it's needed for
> buffer_add anyway).

If you have UTF-32 or UTF-16, just pass the length indeed.  For UTF-8, passing
the byte length will overshoot by a factor of 2 or 3 for anything but ASCII.
You need the # of characters, not # of bytes, etc.

>> The low-level API to fetch that information from GDEF is available through
>> hb_ot_layout_get_lig_carets(), however, very few fonts provide such
>> information.  It's common to just divide the width by the number of graphemes.
> graphemes being non diacritic glyphs?

Graphemes are what a user (of a language) considers to be one entity.  Unicode
defines them:


We may  add code in harfbuzz for that in the future.  A cheap heuristic is to
check for combining-class=0.


> Thanks a lot,
> Tom.

More information about the HarfBuzz mailing list