[HarfBuzz] Beginners question on unicode values

Khaled Hosny dr.khaled.hosny at gmail.com
Tue Feb 4 23:10:42 UTC 2020


Hi,

> On Feb 4, 2020, at 10:29 AM, Patrick <std300 at gmail.com> wrote:
> 
> Hello all,
> 
> (I am new to harfbuzz, I'd appreciate if you point me in the right direction)
> 
> I use LuaTeX, which has a harfbuzz integration, so there might be
> issues that are not related to harfbuzz.
> 
> I load a font that comes with the Apple Mac (zapfino.ttf) and I use
> shape_full() to get the shaping. First attempt is to shape the word
> "Za", which results in codepoints 104 and 504. Then I shape "Zapfino"
> which results in codepoint 1059.

HarfBuzz output is glyph ids (they are unfortunately named codepoints in HarfBuzz because the buffer is used for both input and output).

> Now I need to find out the unicode values for the codepoints.

There is no one to one mapping between glyphs and Unicode code points. If you want the character that a glyph corresponds to, you use the cluster field of the glyph info as index in the input string (this gets a bit tricky when there are ligatures or one to many mappings), see https://harfbuzz.github.io/clusters.html, but since you want to render glyphs this isn’t what you need.

> To get the unicode values, I first do a hb_face_collect_unicodes to
> get all unicode values in the font and then hb_font_get_nominal_glyph
> to map from a unicode value to a codepoint.
> 
> For 104 and 504 I get the unicode values 90 and 97 (dec), which
> correspond to 'Z' and 'a', so the first example looks fine. But I am
> unable to map 1059 to anything, since it does not appear in the list I
> get back from hb_face_collect_unicodes()
> 
> Is there a way to map the codepoint 1059 to a unicode value for this
> specific font? Or is my line of thought wrong and I need to do
> something completely different?

Your problem is that LuaTeX doe not have a distinction between glyphs and characters, or rather that the meaning of char field of LuaTeX glyph nodes can simply be considered an index in the font’s character table and not an actual character.

What you need is to create entries in the font’s characters table for all glyphs in the font and give them any character values you like (ConTeXt uses PUA which often conflicts with actual use of PUA on the font, I use 0x110000 + 256 + glyph index, you might want to check the code in https://github.com/khaledhosny/harf which should work without luaotfload except for the last few commits before it was archived). LuaTeX have rather a very backwards way of handling characters and glyphs, so things are unnecessarily complex.

Regards,
Khaled


More information about the HarfBuzz mailing list