[FriBidi] log2vis() misbehaving with Arabic text?

Behdad Esfahbod behdad at behdad.org
Tue Oct 28 10:18:00 PDT 2014


On 14-10-28 10:09 AM, Philip Semanchuk wrote:
> On Tue, Oct 28, 2014 at 4:45 AM, Behdad Esfahbod <behdad at behdad.org> wrote:
>> On 14-10-27 10:47 AM, Philip Semanchuk wrote:
>>> On Mon, Oct 27, 2014 at 1:26 PM, Behdad Esfahbod <behdad at behdad.org> wrote:
>>>> On 14-10-27 08:39 AM, Philip Semanchuk wrote:
>>>>> I need to play around with it a little, though. For instance, I saw
>>>>> one case where the PDF rendered an unprintable character where
>>>>> log2vis() had inserted a ZWNBSP (0xfeff) into a string. Technically a
>>>>> ZWNBSP should be harmless but...
>>>>
>>>> Right.  FriBidi inserts U+FEFF when it needs to delete a character slot.  The
>>>> FriBidi user should either remove those from the stream or make sure they
>>>> render to nothing.  That sounds like a ReportLab bug.
>>>
>>> Yes, one could also argue that it's my PDF viewer that's at fault.
>>
>> Not really.  The PDF viewer gets exact instructions about what to show...
>> It's the PDF generator that decides.
> 
> I took your advice and tested U+200C in a PDF. Both Acrobat Reader and
> my default PDF reader (Preview -- I'm on OS X) render it as a vertical
> bar. That a surprise; I thought it would either be invisible or render
> as the standard "unprintable character" rectangle.

It's from a broken PDF generator.  The vertical bar is what "show format
characters" in MS apps is supposed to show.  Ie, the font has that glyph, but
the shaping engine (eg, part of what renders to PDF) should know not to show
it normally.


> BTW someone else is having the same problem; here's a PNG:
> http://www.princexml.com/forum/post/12999/attachment/strange_vertical_line.png

I'm not surprised...  It's everywhere when you don't use something like HarfBuzz.


> It's from this conversation:
> http://www.princexml.com/forum/topic/2776/solaiman-lipi-font-in-bangla-is-not-being-rendered-properly
> 
>>> This is one of the things  I need to experiment with.
>>>
>>> Removing ZWNBSP is easy enough. Is any other postprocessing needed
>>> after calling log2vis()?
>>
>> Well, there are more characters that need to be hidden.  Check
>> fribidi_remove_bidi_marks().  By mistake, that function is deprecated, but I
>> don't have a replacement for it if I recall correctly.
> 
> I had read the documentation for fribidi_remove_bidi_marks() but I
> didn't think it removed U+FEFF.

I just tested and looks like it does.


> Is this correct pseudo-code?
> 
> sentence = fribidi_log2vis(sentence)
> sentence = fribidi_remove_bidi_marks(sentence)
> sentence = sentence.replace(ZWNBSP, '')
> 
> I find the man pages for the fribidi functions helpful, but I can't
> find documentation on how to use them together.

Right.  Check fribidi-main.c and fribidi_log2vis() implementations.  In this
case, ./fribidi --clean does what you want.

-- 
behdad
http://behdad.org/


More information about the fribidi mailing list