[FriBidi] Invalid UTF-8 for Arabic
Behnam Esfahbod ZWNJ
behnam at zwnj.org
Wed Mar 11 12:59:21 PDT 2009
Got it. Thanks for the explanation. :)
On Wed, Mar 11, 2009 at 5:31 PM, Behdad Esfahbod <behdad at behdad.org> wrote:
> On 03/10/2009 06:08 AM, Behnam Esfahbod ZWNJ wrote:
>
>> Seems the BOM mark appears in the place of the first character of the
>> LAM+ALEF ligature. This might be a but in the ligature replacement of
>> the shaping function.
>>
>> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
>> fribidi_shape_arabic_ligature(), which replaces the first char of
>> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.
>
> That's expected. It will be removed by fribidi_remove_bidi_marks().
>
>> Behdad, why don't you use U+FFFF for this purpose?
>
> Because U+FFFF is not a valid character. U+FEFF is harmless at that
> position. And has a BN bidi category and is one of the characters that any
> rendering pipeline removes anyway.
>
>> and why
>> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
>> clean this CHAR_FILLs?
>
> The shaping functions do not remove any characters. If they did it would be
> hard to keep the mapping between visual and logical strings. The idea is
> that we add filler chars when needing to remove any characters, and
> fribidi_remove_bidi_marks() or similar functions will remove the fillers
> later.
>
>> Behdad, I can fix the problem in CVS if you tell me what's the best
>> way to fix this.
>
> No fix needed AFAI'm concerned :).
>
> behdad
>
>> -Behnam
>>
>>
>
--
' بهنام اسفهبد
' Behnam Esfahbod
'
* .. http://behnam.esfahbod.info
* ` *
* o * http://zwnj.org
More information about the fribidi
mailing list