[FriBidi] Invalid UTF-8 for Arabic

Behnam Esfahbod ZWNJ behnam at zwnj.org
Wed Mar 11 12:59:21 PDT 2009


Got it.  Thanks for the explanation. :)


On Wed, Mar 11, 2009 at 5:31 PM, Behdad Esfahbod <behdad at behdad.org> wrote:
> On 03/10/2009 06:08 AM, Behnam Esfahbod ZWNJ wrote:
>
>> Seems the BOM mark appears in the place of the first character of the
>> LAM+ALEF ligature.  This might be a but in the ligature replacement of
>> the shaping function.
>>
>> Yep, it's in the CVS HEAD, lib/fribidi-arabic.c
>> fribidi_shape_arabic_ligature(), which replaces the first char of
>> ligature with FRIBIDI_CHAR_FILL, which is ZWNBSP/BOM.
>
> That's expected.  It will be removed by fribidi_remove_bidi_marks().
>
>> Behdad, why don't you use U+FFFF for this purpose?
>
> Because U+FFFF is not a valid character.  U+FEFF is harmless at that
> position.  And has a BN bidi category and is one of the characters that any
> rendering pipeline removes anyway.
>
>> and why
>> fribidi_shape_arabic() or fribidi_shape_arabic_ligature() doesn't
>> clean this CHAR_FILLs?
>
> The shaping functions do not remove any characters.  If they did it would be
> hard to keep the mapping between visual and logical strings.  The idea is
> that we add filler chars when needing to remove any characters, and
> fribidi_remove_bidi_marks() or similar functions will remove the fillers
> later.
>
>> Behdad, I can fix the problem in CVS if you tell me what's the best
>> way to fix this.
>
> No fix needed AFAI'm concerned :).
>
> behdad
>
>> -Behnam
>>
>>
>



-- 
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '
  *  ..   http://behnam.esfahbod.info
 *  `  *
  * o *   http://zwnj.org


More information about the fribidi mailing list