[HarfBuzz] Questions regarding hb_language_t

Wed Jan 22 06:08:32 PST 2014

Motivated by this discussion, I managed to added script itemization to
the HarfBuzz “port” of LibreOffice. Just in case anyone is interested:
https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=1615b7f1d078b2bdf22a856066346e701f816b72

Regards,
Khaled

On Fri, Jan 10, 2014 at 05:18:51PM +0200, Ariel Malka wrote:
> Follow-up to an earlier discussion with Khaled:
> 
> > You basically scan the text, itemize it into contagious script runs and
> > shape each one separately using HarfBuzz. If you are also doing BiDi
> > itemization, then both can interfere (you might end with runs
> > containing only characters with common script property after doing BiDi,
> > so they will be shaped with the default script which can be wrong), so
> > you need to do script itemization first, and BiDi itemization separately
> > then combine both to get runs of same a script and direction to be
> > shaped separately
> 
> This has been synthesized into:
> https://github.com/arielm/Unicode/tree/master/Projects/BIDI
> 
> The relevant "action" is taking place here:
> https://github.com/arielm/Unicode/blob/master/Projects/BIDI/src/TextItemizer.cpp
> 
> HTH,
> Ariel
> 
> 
> On Sun, Dec 15, 2013 at 5:02 PM, Khaled Hosny <khaledhosny at eglug.org> wrote:
> 
> > On Sun, Dec 15, 2013 at 04:38:51PM +0200, Ariel Malka wrote:
> > > I have rendered text successfully with a few different complex scripts
> > > ("Hebr", "Arab", "Hang", "Hani", "Thai", etc.) and it looks like the
> > > hb_buffer_set_language() is not affecting the result.
> > >
> > > The first question I'm asking is therefore: what is the purpose
> > > of hb_buffer_set_language()?
> > > Or in other words: is there a combination which require both the language
> > > and script values to be defined?
> >
> > Many fonts have language-specific features, for example:
> > https://bugs.webkit.org/show_bug.cgi?id=37984
> >
> > Without setting a language, HarfBuzz will use the ‘dflt’ language (AFIK)
> > and the result can be wrong in such cases.
> >
> > > My second question is regarding mapping: is there a way to obtain a
> > > hb_script_tag from a language-code string (e.g. "he" ->
> > HB_SCRIPT_HEBREW)?
> >
> > Many languages are written in different scripts, so there is not always
> > a one to one language to script mapping. The proper way to get the
> > script of a piece of text is by checking the script property of its
> > characters, using the algorithm described by Unicode:
> > http://www.unicode.org/reports/tr24/
> >
> > You basically scan the text, itemize it into contagious script runs and
> > shape each one separately using HarfBuzz. If you are also doing BiDi
> > itemization, then both can interfere (you might end with runs
> > containing only characters with common script property after doing BiDi,
> > so they will be shaped with the default script which can be wrong), so
> > you need to do script itemization first, and BiDi itemization separately
> > then combine both to get runs of same a script and direction to be
> > shaped separately.
> >
> > I find this code an easy to grasp example:
> > https://github.com/mapnik/mapnik/blob/master/src/text/itemizer.cpp
> >
> > Regards,
> > Khaled
> >