[FriBidi] Invalid UTF-8 for Arabic

Behdad Esfahbod behdad at behdad.org
Wed Mar 11 07:50:29 PDT 2009


On 03/11/2009 10:41 AM, Yoann Roman wrote:
>>> Seems the BOM mark appears in the place of the first character of the
>>> LAM+ALEF ligature.  This might be a but in the ligature replacement
>>> of the shaping function.
>>>
>>> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
>>> fribidi_shape_arabic_ligature(), which replaces the first char of
>>> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.
>> That's expected.  It will be removed by fribidi_remove_bidi_marks().
>>
>>> Behdad, why don't you use U+FFFF for this purpose?
>> Because U+FFFF is not a valid character.  U+FEFF is harmless at that
>> position. And has a BN bidi category and is one of the characters
>> that any rendering pipeline removes anyway.
>>
>>> and why
>>> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
>>> clean this CHAR_FILLs?
>> The shaping functions do not remove any characters.  If they did it
>> would be hard to keep the mapping between visual and logical strings.
>> The idea is that we add filler chars when needing to remove any
>> characters, and fribidi_remove_bidi_marks() or similar functions will
>> remove the fillers later.
>>
>>> Behdad, I can fix the problem in CVS if you tell me what's the best
>>> way to fix this.
>> No fix needed AFAI'm concerned :).
>
> So, practically, I need to call fribidi_remove_bidi_marks after calling
> log2vis? I see that's done by fribidi.exe if I pass in --clean.

Yes.

behdad

> Thanks,
>


More information about the fribidi mailing list