[HarfBuzz] HarfBuzz glyph offsets

Thu Dec 24 13:07:54 PST 2015

On Thu, Dec 24, 2015 at 12:50:42PM -0800, Jonathan Blow wrote:
> Khaled wrote:
> 
> 
> 
> > Each Unicode character has a script property, so you don’t need to hard
> > code it for the text. The only complication is inherited or common
> > characters, but there is a simple heuristic to handle them, see for
> > example:
> > https://github.com/HOST-Oman/libraqm/blob/master/raqm.c#L289
> >
> > But if you are sure your text is always single script and language (I
> > see the Arabic has English words, so doesn’t seem to be the case), then
> > you can hard code the script values.
> >
> 
> Does this mean that passing UNKNOWN and letting HB figure it out is the
> right thing then?

HarfBuzz does not do an such detection by default (there is the guess
segment properties function, but it does very simplistic detection and
is meant only for quick testing, not real world use).

> For example: Is there some sample text in mixed Arabic w/ bidi English
> names, etc, that will come out wrong if I just set the language to "arb"
> and script to arabic? That is what I am doing in those screenshots, and
> whereas "they look fine to me" we all know that is no guarantee things
> aren't horrible in some corner case.

Here is a quick test:
~$ hb-shape DejaVuSans.ttf fiAV --script=latn --direction=ltr
[fi=0+1290|A=2+1270|V=3+1401]

~$ hb-shape DejaVuSans.ttf fiAV --script=arab --direction=ltr
[f=0+721|i=1+569|A=2+1401|V=3+1401]

You get no ligature or kerning in the second case, and probably no
Latin-specific features will be activated at all. Not all fonts will
fail like this, but many will do.

Regards,
Khaled