[HarfBuzz] Questions regarding hb_language_t

Fri Jan 10 07:18:51 PST 2014

Follow-up to an earlier discussion with Khaled:

> You basically scan the text, itemize it into contagious script runs and
> shape each one separately using HarfBuzz. If you are also doing BiDi
> itemization, then both can interfere (you might end with runs
> containing only characters with common script property after doing BiDi,
> so they will be shaped with the default script which can be wrong), so
> you need to do script itemization first, and BiDi itemization separately
> then combine both to get runs of same a script and direction to be
> shaped separately

This has been synthesized into:
https://github.com/arielm/Unicode/tree/master/Projects/BIDI

The relevant "action" is taking place here:
https://github.com/arielm/Unicode/blob/master/Projects/BIDI/src/TextItemizer.cpp

HTH,
Ariel

On Sun, Dec 15, 2013 at 5:02 PM, Khaled Hosny <khaledhosny at eglug.org> wrote:

> On Sun, Dec 15, 2013 at 04:38:51PM +0200, Ariel Malka wrote:
> > I have rendered text successfully with a few different complex scripts
> > ("Hebr", "Arab", "Hang", "Hani", "Thai", etc.) and it looks like the
> > hb_buffer_set_language() is not affecting the result.
> >
> > The first question I'm asking is therefore: what is the purpose
> > of hb_buffer_set_language()?
> > Or in other words: is there a combination which require both the language
> > and script values to be defined?
>
> Many fonts have language-specific features, for example:
> https://bugs.webkit.org/show_bug.cgi?id=37984
>
> Without setting a language, HarfBuzz will use the ‘dflt’ language (AFIK)
> and the result can be wrong in such cases.
>
> > My second question is regarding mapping: is there a way to obtain a
> > hb_script_tag from a language-code string (e.g. "he" ->
> HB_SCRIPT_HEBREW)?
>
> Many languages are written in different scripts, so there is not always
> a one to one language to script mapping. The proper way to get the
> script of a piece of text is by checking the script property of its
> characters, using the algorithm described by Unicode:
> http://www.unicode.org/reports/tr24/
>
> You basically scan the text, itemize it into contagious script runs and
> shape each one separately using HarfBuzz. If you are also doing BiDi
> itemization, then both can interfere (you might end with runs
> containing only characters with common script property after doing BiDi,
> so they will be shaped with the default script which can be wrong), so
> you need to do script itemization first, and BiDi itemization separately
> then combine both to get runs of same a script and direction to be
> shaped separately.
>
> I find this code an easy to grasp example:
> https://github.com/mapnik/mapnik/blob/master/src/text/itemizer.cpp
>
> Regards,
> Khaled
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20140110/d2f25219/attachment.html>