[FriBidi] Invalid UTF-8 for Arabic

Yoann Roman yroman-fribidi at altalang.com
Thu Mar 5 14:44:10 PST 2009


I'm using the trunk fribidi2 code, compiled with VS2003 on Windows XP, 
from Python with the Pyfribidi extension, also compiled on VS 2003.

When giving log2vis this Arabic Unicode string:

u'\u0647\u0644 \u062a\u062a\u0646\u0627\u0648\u0644 \u0627\u0644\u0641
\u0637\u0648\u0631 \u0643\u0644 \u064a\u0648\u0645\u061f \u0644\u0645
\u0627\u0630\u0627 \u0623\u0648 \u0644\u0645\u0627\u0630\u0627 \u0644
\u0627\u061f'

I get back an invalid UTF-8 code point, as shown below:

'\xd8\x9f\xef\xbb\xbb\xef\xbb\xbf \xef\xba\x8d\xef\xba\xab\xef\xba\x8e
\xef\xbb\xa4\xef\xbb\x9f \xef\xbb\xad\xef\xba\x83 \xef\xba\x8d\xef\xba
\xab\xef\xba\x8e\xef\xbb\xa4\xef\xbb\x9f \xd8\x9f\xef\xbb\xa1\xef\xbb
\xae\xef\xbb\xb3 \xef\xbb\x9e\xef\xbb\x9b \xef\xba\xad\xef\xbb\xae\xef
\xbb\x84\xef\xbb'

The last \xef\xbb should have one more byte.

The Pyfribidi extension works fine with the 0.10.9 code (minus the 
Arabic joining), and its only update for the 0.19.1 code is to use 
FriBidiParType instead of FriBidiCharType.

Any ideas?

-- 
Yoann Roman



More information about the fribidi mailing list