[HarfBuzz] A couple of clarifications regarding HarfBuzz

Behdad Esfahbod behdad at behdad.org
Thu Oct 21 11:55:51 PDT 2010


On 10/21/10 04:10, Tom Hacohen wrote:

>> Language is used to do language-specific adjustments when appropriate.  You
>> typically just pass the locale or whatever your higher-level tells you (think
>> of lang attribute in html) to hb_language_from_string.
>
> As I thought, thanks, I wasn't thinking about languages using the same
> script like many of the latin languages and their ligatures.

It's more than just Latin.


>> HarfBuzz does the right thing no matter what you pass in.   So you can safely
>> pass 0.  String length in characters would be most appropriate if you have it.
>
> I assumed HarfBuzz does well anyway, but I want the fastest way
> possible. Ok then, I have the string's length (as it's needed for
> buffer_add anyway).

If you have UTF-32 or UTF-16, just pass the length indeed.  For UTF-8, passing
the byte length will overshoot by a factor of 2 or 3 for anything but ASCII.
You need the # of characters, not # of bytes, etc.


>> The low-level API to fetch that information from GDEF is available through
>> hb_ot_layout_get_lig_carets(), however, very few fonts provide such
>> information.  It's common to just divide the width by the number of graphemes.
>
> graphemes being non diacritic glyphs?

Graphemes are what a user (of a language) considers to be one entity.  Unicode
defines them:

  http://www.unicode.org/reports/tr29/

We may  add code in harfbuzz for that in the future.  A cheap heuristic is to
check for combining-class=0.

behdad


> Thanks a lot,
> Tom.
> 
> 



More information about the HarfBuzz mailing list