[HarfBuzz] What is wrong with unicode in harfbuzz?
Kelvin Ma
kelvinsthirteen at gmail.com
Fri Jun 17 01:35:03 UTC 2016
When I run a simple harfbuzz shaping like
string = 'In begíffi our '
> utfstring = string.encode('utf-8')
>
> buf = hb.buffer_create()
> hb.buffer_add_utf8(buf, utfstring, 0, -1)
> hb.buffer_guess_segment_properties(buf)
>
> hb.shape(font, buf, [])
> infos = hb.buffer_get_glyph_infos(buf)
> positions = hb.buffer_get_glyph_positions(buf)
>
I get
len(string) = 15
len(infos) = 13
len(positions) = 13
which makes sense, three glyphs became one so 15 characters makes 13
glyphs. But the cluster values are wrong because they don’t line up with
the character indexes any more (because of the accented character).
But then when I change it to utf-16
string = 'In begíffi our '
> utfstring = string.encode('utf-16')
>
> buf = hb.buffer_create()
> hb.buffer_add_utf16(buf, utfstring, 0, -1)
> hb.buffer_guess_segment_properties(buf)
>
> hb.shape(font, buf, [])
> infos = hb.buffer_get_glyph_infos(buf)
> positions = hb.buffer_get_glyph_positions(buf)
>
I get
len(string) = 15
len(infos) = 32
len(positions) = 32
And when I change it to utf-32, which this post
<http://comments.gmane.org/gmane.comp.freedesktop.harfbuzz/1836> says
should make it give character counts, but
string = 'In begíffi our '
> utfstring = string.encode('utf-32')
>
> buf = hb.buffer_create()
> hb.buffer_add_utf32(buf, utfstring, 0, -1)
> hb.buffer_guess_segment_properties(buf)
>
> hb.shape(font, buf, [])
> infos = hb.buffer_get_glyph_infos(buf)
> positions = hb.buffer_get_glyph_positions(buf)
>
makes
len(string) = 15
len(infos) = 64
len(positions) = 64
What’s going on here? Why does harfbuzz suddenly output 64 glyphs? I
thought glyphs weren’t supposed to depend on the original encoding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20160616/bd292fc7/attachment.html>
More information about the HarfBuzz
mailing list