[FriBidi] Invalid UTF-8 for Arabic

Yoann Roman yroman at altalang.com
Fri Mar 6 14:55:48 PST 2009


Behdad Esfahbod wrote:
>> I'm using the trunk fribidi2 code, compiled with VS2003 on Windows 
>> XP, from Python with the Pyfribidi extension, also compiled on VS 
>> 2003.
> 
> The first step to debug this is to make sure PyFriBidi is not the 
> culprit. That is, can you reproduce the bug using C?  If yes, please 
> send the code here.

I'm no C expert, so I took a slightly different approach to pull 
Pyfribidi out of the equation. I compiled fribidi.exe and used a Hex 
editor to check its output. Looks like the lost final byte may be a 
Pyfribidi problem. This test did bring up another bug, though.

Attached is a zip with:

  - arabic.input: the Arabic string straight out of Python. This will 
    show up correctly in anything with bidi support (e.g., Notepad on 
    Windows XP with Arabic support installed). There is no BOM.
  
  - arabic.output: output from running bin\fribidi.exe --nopad 
    arabic.input. No Python involved here.
  
  - arabic-correct.png: a correct Word visual representation

  - arabic-incorrect.png: what I get using arabic.output

If you open arabic.output in a Hex editor, you'll see that bytes 5 
through 7 contain the UTF-8 BOM sequence. It looks like no characters 
are missing, though.

Is this enough info to track this new issue down?

Thanks,

-- 
Yoann Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Arabic.zip
Type: application/x-zip-compressed
Size: 11060 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/fribidi/attachments/20090306/8f0d3e00/attachment.bin 


More information about the fribidi mailing list