[FriBidi] Invalid UTF-8 for Arabic
Behdad Esfahbod
behdad at behdad.org
Wed Mar 11 07:01:49 PDT 2009
On 03/10/2009 06:08 AM, Behnam Esfahbod ZWNJ wrote:
> Seems the BOM mark appears in the place of the first character of the
> LAM+ALEF ligature. This might be a but in the ligature replacement of
> the shaping function.
>
> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
> fribidi_shape_arabic_ligature(), which replaces the first char of
> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.
That's expected. It will be removed by fribidi_remove_bidi_marks().
> Behdad, why don't you use U+FFFF for this purpose?
Because U+FFFF is not a valid character. U+FEFF is harmless at that position.
And has a BN bidi category and is one of the characters that any rendering
pipeline removes anyway.
> and why
> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
> clean this CHAR_FILLs?
The shaping functions do not remove any characters. If they did it would be
hard to keep the mapping between visual and logical strings. The idea is that
we add filler chars when needing to remove any characters, and
fribidi_remove_bidi_marks() or similar functions will remove the fillers later.
> Behdad, I can fix the problem in CVS if you tell me what's the best
> way to fix this.
No fix needed AFAI'm concerned :).
behdad
> -Behnam
>
>
More information about the fribidi
mailing list