[FriBidi] Invalid UTF-8 for Arabic
Yoann Roman
yroman-fribidi at altalang.com
Wed Mar 11 07:41:16 PDT 2009
>> Seems the BOM mark appears in the place of the first character of the
>> LAM+ALEF ligature. This might be a but in the ligature replacement
>> of the shaping function.
>>
>> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
>> fribidi_shape_arabic_ligature(), which replaces the first char of
>> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.
>
> That's expected. It will be removed by fribidi_remove_bidi_marks().
>
>> Behdad, why don't you use U+FFFF for this purpose?
>
> Because U+FFFF is not a valid character. U+FEFF is harmless at that
> position. And has a BN bidi category and is one of the characters
> that any rendering pipeline removes anyway.
>
>> and why
>> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
>> clean this CHAR_FILLs?
>
> The shaping functions do not remove any characters. If they did it
> would be hard to keep the mapping between visual and logical strings.
> The idea is that we add filler chars when needing to remove any
> characters, and fribidi_remove_bidi_marks() or similar functions will
> remove the fillers later.
>
>> Behdad, I can fix the problem in CVS if you tell me what's the best
>> way to fix this.
>
> No fix needed AFAI'm concerned :).
So, practically, I need to call fribidi_remove_bidi_marks after calling
log2vis? I see that's done by fribidi.exe if I pass in --clean.
Thanks,
--
Yoann Roman
More information about the fribidi
mailing list