[FriBidi] Invalid UTF-8 for Arabic

Yoann Roman yroman-fribidi at altalang.com
Wed Mar 11 07:41:16 PDT 2009


>> Seems the BOM mark appears in the place of the first character of the
>> LAM+ALEF ligature.  This might be a but in the ligature replacement 
>> of the shaping function.
>>
>> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
>> fribidi_shape_arabic_ligature(), which replaces the first char of
>> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.
> 
> That's expected.  It will be removed by fribidi_remove_bidi_marks().
> 
>> Behdad, why don't you use U+FFFF for this purpose?
> 
> Because U+FFFF is not a valid character.  U+FEFF is harmless at that 
> position. And has a BN bidi category and is one of the characters 
> that any rendering pipeline removes anyway.
>
>> and why
>> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
>> clean this CHAR_FILLs?
> 
> The shaping functions do not remove any characters.  If they did it 
> would be hard to keep the mapping between visual and logical strings. 
> The idea is that we add filler chars when needing to remove any 
> characters, and fribidi_remove_bidi_marks() or similar functions will 
> remove the fillers later.
> 
>> Behdad, I can fix the problem in CVS if you tell me what's the best
>> way to fix this.
> 
> No fix needed AFAI'm concerned :).

So, practically, I need to call fribidi_remove_bidi_marks after calling 
log2vis? I see that's done by fribidi.exe if I pass in --clean.

Thanks,

-- 
Yoann Roman



More information about the fribidi mailing list