[FriBidi] Invalid UTF-8 for Arabic
Yoann Roman
yroman-fribidi at altalang.com
Thu Mar 5 14:44:10 PST 2009
I'm using the trunk fribidi2 code, compiled with VS2003 on Windows XP,
from Python with the Pyfribidi extension, also compiled on VS 2003.
When giving log2vis this Arabic Unicode string:
u'\u0647\u0644 \u062a\u062a\u0646\u0627\u0648\u0644 \u0627\u0644\u0641
\u0637\u0648\u0631 \u0643\u0644 \u064a\u0648\u0645\u061f \u0644\u0645
\u0627\u0630\u0627 \u0623\u0648 \u0644\u0645\u0627\u0630\u0627 \u0644
\u0627\u061f'
I get back an invalid UTF-8 code point, as shown below:
'\xd8\x9f\xef\xbb\xbb\xef\xbb\xbf \xef\xba\x8d\xef\xba\xab\xef\xba\x8e
\xef\xbb\xa4\xef\xbb\x9f \xef\xbb\xad\xef\xba\x83 \xef\xba\x8d\xef\xba
\xab\xef\xba\x8e\xef\xbb\xa4\xef\xbb\x9f \xd8\x9f\xef\xbb\xa1\xef\xbb
\xae\xef\xbb\xb3 \xef\xbb\x9e\xef\xbb\x9b \xef\xba\xad\xef\xbb\xae\xef
\xbb\x84\xef\xbb'
The last \xef\xbb should have one more byte.
The Pyfribidi extension works fine with the 0.10.9 code (minus the
Arabic joining), and its only update for the 0.19.1 code is to use
FriBidiParType instead of FriBidiCharType.
Any ideas?
--
Yoann Roman
More information about the fribidi
mailing list