<div dir="ltr"><div><div><div><div><div><div>When I run a simple harfbuzz shaping like<br><br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><span style="font-family:monospace,monospace">string = 'In begíffi our '<br>utfstring = string.encode('utf-8')<br><br>buf = hb.buffer_create()<br>hb.buffer_add_utf8(buf, utfstring, 0, -1)<br>hb.buffer_guess_segment_properties(buf)<br><br>hb.shape(font, buf, [])<br>infos = hb.buffer_get_glyph_infos(buf)<br>positions = hb.buffer_get_glyph_positions(buf)</span><br></blockquote><br></div>I get<br><br><span style="font-family:monospace,monospace">len(string) = 15<br>len(infos) = 13<br>len(positions) = 13</span><br><br></div>which makes sense, three glyphs became one so 15 characters makes 13 glyphs. But the cluster values are wrong because they don’t line up with the character indexes any more (because of the accented character).<br><br></div>But then when I change it to utf-16<br><br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><span style="font-family:monospace,monospace">string = 'In begíffi our '</span><br><span style="font-family:monospace,monospace">utfstring = string.encode('utf-16')</span><br><span style="font-family:monospace,monospace"></span><br><span style="font-family:monospace,monospace">buf = hb.buffer_create()</span><br><span style="font-family:monospace,monospace">hb.buffer_add_utf16(buf, utfstring, 0, -1)</span><br><span style="font-family:monospace,monospace">hb.buffer_guess_segment_properties(buf)</span><br><span style="font-family:monospace,monospace"></span><br><span style="font-family:monospace,monospace">hb.shape(font, buf, [])</span><br><span style="font-family:monospace,monospace">infos = hb.buffer_get_glyph_infos(buf)</span><br><span style="font-family:monospace,monospace">positions = hb.buffer_get_glyph_positions(buf)</span><br><span style="font-family:monospace,monospace"></span></blockquote><span style="font-family:monospace,monospace"><br></span></div><span style="font-family:arial,helvetica,sans-serif">I get<br></span><br><span style="font-family:monospace,monospace">len(string) = 15<br>len(infos) = 32<br>len(positions) = 32</span><br><br></div>And when I change it to utf-32, which <a href="http://comments.gmane.org/gmane.comp.freedesktop.harfbuzz/1836">this post</a> says should make it give character counts, but<br><br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><span style="font-family:monospace,monospace">string = 'In begíffi our '</span><br><span style="font-family:monospace,monospace">utfstring = string.encode('utf-32')</span><br><span style="font-family:monospace,monospace"></span><br><span style="font-family:monospace,monospace">buf = hb.buffer_create()</span><br><span style="font-family:monospace,monospace">hb.buffer_add_utf32(buf, utfstring, 0, -1)</span><br><span style="font-family:monospace,monospace">hb.buffer_guess_segment_properties(buf)</span><br><span style="font-family:monospace,monospace"></span><br><span style="font-family:monospace,monospace">hb.shape(font, buf, [])</span><br><span style="font-family:monospace,monospace">infos = hb.buffer_get_glyph_infos(buf)</span><br><span style="font-family:monospace,monospace">positions = hb.buffer_get_glyph_positions(buf)</span><br><span style="font-family:monospace,monospace"></span></blockquote><span style="font-family:monospace,monospace"><br></span><span style="font-family:arial,helvetica,sans-serif">makes<br></span><span style="font-family:monospace,monospace"><br>len(string) = 15<br>len(infos) = 64<br>len(positions) = 64</span><br><br></div>What’s going on here? Why does harfbuzz suddenly output 64 glyphs? I thought glyphs weren’t supposed to depend on the original encoding<br></div>