[FriBidi] Invalid UTF-8 for Arabic

Behdad Esfahbod behdad at behdad.org
Fri Mar 6 11:22:33 PST 2009


Yoann Roman wrote:
> I'm using the trunk fribidi2 code, compiled with VS2003 on Windows XP, 
> from Python with the Pyfribidi extension, also compiled on VS 2003.

The first step to debug this is to make sure PyFriBidi is not the culprit.
That is, can you reproduce the bug using C?  If yes, please send the code here.

Cheers,
behdad

> When giving log2vis this Arabic Unicode string:
> 
> u'\u0647\u0644 \u062a\u062a\u0646\u0627\u0648\u0644 \u0627\u0644\u0641
> \u0637\u0648\u0631 \u0643\u0644 \u064a\u0648\u0645\u061f \u0644\u0645
> \u0627\u0630\u0627 \u0623\u0648 \u0644\u0645\u0627\u0630\u0627 \u0644
> \u0627\u061f'
> 
> I get back an invalid UTF-8 code point, as shown below:
> 
> '\xd8\x9f\xef\xbb\xbb\xef\xbb\xbf \xef\xba\x8d\xef\xba\xab\xef\xba\x8e
> \xef\xbb\xa4\xef\xbb\x9f \xef\xbb\xad\xef\xba\x83 \xef\xba\x8d\xef\xba
> \xab\xef\xba\x8e\xef\xbb\xa4\xef\xbb\x9f \xd8\x9f\xef\xbb\xa1\xef\xbb
> \xae\xef\xbb\xb3 \xef\xbb\x9e\xef\xbb\x9b \xef\xba\xad\xef\xbb\xae\xef
> \xbb\x84\xef\xbb'
> 
> The last \xef\xbb should have one more byte.
> 
> The Pyfribidi extension works fine with the 0.10.9 code (minus the 
> Arabic joining), and its only update for the 0.19.1 code is to use 
> FriBidiParType instead of FriBidiCharType.
> 
> Any ideas?
> 


More information about the fribidi mailing list