[FriBidi] MS-Word bidirectional behaviour

Shachar Shemesh shachar at shemesh.biz
Sat Dec 10 12:19:47 UTC 2016


On 09/12/16 20:37, Eduardo Castiñeyra wrote:
> Hi guys,
>
> We have users in Iran who are complaining about our app not behaving
> the way MS-Word does when it comes to directional ordering.
>
> In most applications, if one writes the sentence "In an attack of an
> F14 780 people died" in Persian with no RTL marks one gets the following:
>
> هواپیماهای F14 ۷۸۰ نفر را مصدوم کردند.
>
> Obviously the ۷۸۰ number is missplaced, it should be on the left side
> of F14. Even if the numerals were Persian, most applications get the
> ۷۸۰ in the wrong position, and so does FriBiDi. My understanding is
> there is only two ways of fixing it
>
> 1) Force the user to insert an RTL mark after F14
> 2) Detect that ۷۸۰ is writen in Persian numerals and automatically
> treat it as an RTL run (maybe FriBiDi should do that?)
>
> However, somehow MS-Word detects when the user changes the keyboard
> layout and that affects the ordering as shown in the following picture.
>
> https://snag.gy/pbsh7g.jpg 
Fribidi implements the "Unicode Bidi Algorithm" (henceforth, UBA). It is
defined in techincal report #9 of the Unicode consortium. You can view
it at http://unicode.org/reports/tr9/.

You can see the BiDi parsing of the sentence you wrote at
http://unicode.org/cldr/utility/bidi.jsp?a=%D9%87%D9%88%D8%A7%D9%BE%DB%8C%D9%85%D8%A7%D9%87%D8%A7%DB%8C+F14+%DB%B7%DB%B8%DB%B0+%D9%86%D9%81%D8%B1+%D8%B1%D8%A7+%D9%85%D8%B5%D8%AF%D9%88%D9%85+%DA%A9%D8%B1%D8%AF%D9%86%D8%AF.+&p=RTL

It is according to an older spec of the UBA, but I don't see anything
there that should make a difference.

You can see that your problem is that both the 14 and the 780 are
catagorized as BiDi class EN. This means that the space between them is
a neutral between two same direction letters, and gets a left to right
direction, hence your problem.

Within the UBA (which is what FreeBidi is doing), I'm afraid there is no
solution other than to insert an RLM, as you've suggested (parsed
sentence:
http://unicode.org/cldr/utility/bidi.jsp?a=%D9%87%D9%88%D8%A7%D9%BE%DB%8C%D9%85%D8%A7%D9%87%D8%A7%DB%8C+F14+%E2%80%8F%DB%B7%DB%B8%DB%B0+%D9%86%D9%81%D8%B1+%D8%B1%D8%A7+%D9%85%D8%B5%D8%AF%D9%88%D9%85+%DA%A9%D8%B1%D8%AF%D9%86%D8%AF.+&p=RTL)

As for Word, the reason it "works" is that it does not use the UBA in
order to render BiDi. It actually saves the keyboard language with which
each letter was typed. This is both non-standard (obviously) and error
prone. In my experience, it generates a lot of user confusion as to how
to type things so that they turn out correctly on screen.

In short, I have done nothing to help you solve your problem, but I hope
you now understand it better :-)

Shachar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/fribidi/attachments/20161210/623e7959/attachment.html>


More information about the fribidi mailing list