[FriBidi] log2vis() misbehaving with Arabic text?

Philip Semanchuk osvenskan at gmail.com
Tue Oct 28 10:53:03 PDT 2014


On Tue, Oct 28, 2014 at 1:18 PM, Behdad Esfahbod <behdad at behdad.org> wrote:
> On 14-10-28 10:09 AM, Philip Semanchuk wrote:
>> I took your advice and tested U+200C in a PDF. Both Acrobat Reader and
>> my default PDF reader (Preview -- I'm on OS X) render it as a vertical
>> bar. That a surprise; I thought it would either be invisible or render
>> as the standard "unprintable character" rectangle.
>
> It's from a broken PDF generator.  The vertical bar is what "show format
> characters" in MS apps is supposed to show.  Ie, the font has that glyph, but
> the shaping engine (eg, part of what renders to PDF) should know not to show
> it normally.

So you're saying that characters like U+200C and U+200D are like
processing instructions to the PDF generator (reportlab, in this case)
that should inform the text layout engine but should then be stripped?
In other words, they should not appear in the generated .pdf file?

>> I had read the documentation for fribidi_remove_bidi_marks() but I
>> didn't think it removed U+FEFF.
>
> I just tested and looks like it does.

Great!

>> Is this correct pseudo-code?
>>
>> sentence = fribidi_log2vis(sentence)
>> sentence = fribidi_remove_bidi_marks(sentence)
>> sentence = sentence.replace(ZWNBSP, '')
>>
>> I find the man pages for the fribidi functions helpful, but I can't
>> find documentation on how to use them together.
>
> Right.  Check fribidi-main.c and fribidi_log2vis() implementations.  In this
> case, ./fribidi --clean does what you want.

Good to know -- thanks for the tip.

Bye for now
Philip


More information about the fribidi mailing list