[FriBidi] Invalid UTF-8 for Arabic

Behdad Esfahbod behdad at behdad.org
Wed Mar 11 07:01:49 PDT 2009


On 03/10/2009 06:08 AM, Behnam Esfahbod ZWNJ wrote:

> Seems the BOM mark appears in the place of the first character of the
> LAM+ALEF ligature.  This might be a but in the ligature replacement of
> the shaping function.
>
> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
> fribidi_shape_arabic_ligature(), which replaces the first char of
> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.

That's expected.  It will be removed by fribidi_remove_bidi_marks().

> Behdad, why don't you use U+FFFF for this purpose?

Because U+FFFF is not a valid character.  U+FEFF is harmless at that position. 
  And has a BN bidi category and is one of the characters that any rendering 
pipeline removes anyway.

> and why
> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
> clean this CHAR_FILLs?

The shaping functions do not remove any characters.  If they did it would be 
hard to keep the mapping between visual and logical strings.  The idea is that 
we add filler chars when needing to remove any characters, and 
fribidi_remove_bidi_marks() or similar functions will remove the fillers later.

> Behdad, I can fix the problem in CVS if you tell me what's the best
> way to fix this.

No fix needed AFAI'm concerned :).

behdad

> -Behnam
>
>


More information about the fribidi mailing list