[HarfBuzz] Questions regarding hb_language_t

Sun Dec 15 07:02:30 PST 2013

On Sun, Dec 15, 2013 at 04:38:51PM +0200, Ariel Malka wrote:
> I have rendered text successfully with a few different complex scripts
> ("Hebr", "Arab", "Hang", "Hani", "Thai", etc.) and it looks like the
> hb_buffer_set_language() is not affecting the result.
> 
> The first question I'm asking is therefore: what is the purpose
> of hb_buffer_set_language()?
> Or in other words: is there a combination which require both the language
> and script values to be defined?

Many fonts have language-specific features, for example:
https://bugs.webkit.org/show_bug.cgi?id=37984

Without setting a language, HarfBuzz will use the ‘dflt’ language (AFIK)
and the result can be wrong in such cases.

> My second question is regarding mapping: is there a way to obtain a
> hb_script_tag from a language-code string (e.g. "he" -> HB_SCRIPT_HEBREW)?

Many languages are written in different scripts, so there is not always
a one to one language to script mapping. The proper way to get the
script of a piece of text is by checking the script property of its
characters, using the algorithm described by Unicode:
http://www.unicode.org/reports/tr24/

You basically scan the text, itemize it into contagious script runs and
shape each one separately using HarfBuzz. If you are also doing BiDi
itemization, then both can interfere (you might end with runs
containing only characters with common script property after doing BiDi,
so they will be shaped with the default script which can be wrong), so
you need to do script itemization first, and BiDi itemization separately
then combine both to get runs of same a script and direction to be
shaped separately.

I find this code an easy to grasp example:
https://github.com/mapnik/mapnik/blob/master/src/text/itemizer.cpp

Regards,
Khaled