[FriBidi] log2vis() misbehaving with Arabic text?

Tue Oct 28 10:53:03 PDT 2014

On Tue, Oct 28, 2014 at 1:18 PM, Behdad Esfahbod <behdad at behdad.org> wrote:
> On 14-10-28 10:09 AM, Philip Semanchuk wrote:
>> I took your advice and tested U+200C in a PDF. Both Acrobat Reader and
>> my default PDF reader (Preview -- I'm on OS X) render it as a vertical
>> bar. That a surprise; I thought it would either be invisible or render
>> as the standard "unprintable character" rectangle.
>
> It's from a broken PDF generator.  The vertical bar is what "show format
> characters" in MS apps is supposed to show.  Ie, the font has that glyph, but
> the shaping engine (eg, part of what renders to PDF) should know not to show
> it normally.

So you're saying that characters like U+200C and U+200D are like
processing instructions to the PDF generator (reportlab, in this case)
that should inform the text layout engine but should then be stripped?
In other words, they should not appear in the generated .pdf file?

>> I had read the documentation for fribidi_remove_bidi_marks() but I
>> didn't think it removed U+FEFF.
>
> I just tested and looks like it does.

Great!

>> Is this correct pseudo-code?
>>
>> sentence = fribidi_log2vis(sentence)
>> sentence = fribidi_remove_bidi_marks(sentence)
>> sentence = sentence.replace(ZWNBSP, '')
>>
>> I find the man pages for the fribidi functions helpful, but I can't
>> find documentation on how to use them together.
>
> Right.  Check fribidi-main.c and fribidi_log2vis() implementations.  In this
> case, ./fribidi --clean does what you want.

Good to know -- thanks for the tip.

Bye for now
Philip