[FriBidi] log2vis() misbehaving with Arabic text?

Tue Oct 21 11:42:48 PDT 2014

Hi all,
I’m a non-Arabic speaker working with Arabic text, and I sometimes see
results from log2vis() that don’t look quite right.

I have a test where I compare the results from two Arabic text reshapers.
The first is pyfribidi which is a thin wrapper around fribidi-0.19.6. The
second reshaper is a Python port of this:
https://github.com/agawish/Better-Arabic-Reshaper/

The two reshapers almost always agree on how to reshape a logical word.
However, in some cases they don’t. For instance, when given this logical
string:
u'\u062c\u0627\u0630\u0628\u064a\u0651\u0629'

log2vis() puts the Shadda in a different place than the BAR
(Better-Arabic-Reshaper):
log2vis: u’\ufe94\ufef4\u0651\ufe91\ufeab\ufe8e\ufe9f'
bar:     u’\ufe94\u0651\ufef4\ufe91\ufeab\ufe8e\ufe9f'

When I paste quoted versions of the visual representation of these strings
into Google, Google finds 23 instances of the fribidi version and ~37k
versions of the BAR version. To me, that’s a pretty strong argument that
the fribidi version is incorrect.

I’m happy to file a bug if that’s appropriate, but I’d much rather learn
that I’m simply misusing the library or making some other mistake. Can
someone help me understand what I'm doing wrong?

Thanks in advance
Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/fribidi/attachments/20141021/c8042877/attachment.html>