[FriBidi] MS-Word bidirectional behaviour

Eduardo Castiñeyra eduardo at brainstorm3d.com
Mon Dec 12 10:17:58 UTC 2016


On 12/10/2016 1:19 PM, Shachar Shemesh wrote:

> On 09/12/16 20:37, Eduardo Castiñeyra wrote:
>> Hi guys,
>>
>> We have users in Iran who are complaining about our app not behaving 
>> the way MS-Word does when it comes to directional ordering.
>>
>> In most applications, if one writes the sentence "In an attack of an 
>> F14 780 people died" in Persian with no RTL marks one gets the 
>> following:
>>
>> هواپیماهای F14 ۷۸۰ نفر را مصدوم کردند.
>>
>> Obviously the ۷۸۰ number is missplaced, it should be on the left side 
>> of F14. Even if the numerals were Persian, most applications get the 
>> ۷۸۰ in the wrong position, and so does FriBiDi. My understanding is 
>> there is only two ways of fixing it
>>
>> 1) Force the user to insert an RTL mark after F14
>> 2) Detect that ۷۸۰ is writen in Persian numerals and automatically 
>> treat it as an RTL run (maybe FriBiDi should do that?)
>>
>> However, somehow MS-Word detects when the user changes the keyboard 
>> layout and that affects the ordering as shown in the following picture.
>>
>> https://snag.gy/pbsh7g.jpg 
> Fribidi implements the "Unicode Bidi Algorithm" (henceforth, UBA). It 
> is defined in techincal report #9 of the Unicode consortium. You can 
> view it at http://unicode.org/reports/tr9/.
>
> You can see the BiDi parsing of the sentence you wrote at 
> http://unicode.org/cldr/utility/bidi.jsp?a=%D9%87%D9%88%D8%A7%D9%BE%DB%8C%D9%85%D8%A7%D9%87%D8%A7%DB%8C+F14+%DB%B7%DB%B8%DB%B0+%D9%86%D9%81%D8%B1+%D8%B1%D8%A7+%D9%85%D8%B5%D8%AF%D9%88%D9%85+%DA%A9%D8%B1%D8%AF%D9%86%D8%AF.+&p=RTL
>
> It is according to an older spec of the UBA, but I don't see anything 
> there that should make a difference.
>
> You can see that your problem is that both the 14 and the 780 are 
> catagorized as BiDi class EN. This means that the space between them 
> is a neutral between two same direction letters, and gets a left to 
> right direction, hence your problem.
>
> Within the UBA (which is what FreeBidi is doing), I'm afraid there is 
> no solution other than to insert an RLM, as you've suggested (parsed 
> sentence: 
> http://unicode.org/cldr/utility/bidi.jsp?a=%D9%87%D9%88%D8%A7%D9%BE%DB%8C%D9%85%D8%A7%D9%87%D8%A7%DB%8C+F14+%E2%80%8F%DB%B7%DB%B8%DB%B0+%D9%86%D9%81%D8%B1+%D8%B1%D8%A7+%D9%85%D8%B5%D8%AF%D9%88%D9%85+%DA%A9%D8%B1%D8%AF%D9%86%D8%AF.+&p=RTL)
>
> As for Word, the reason it "works" is that it does not use the UBA in 
> order to render BiDi. It actually saves the keyboard language with 
> which each letter was typed. This is both non-standard (obviously) and 
> error prone. In my experience, it generates a lot of user confusion as 
> to how to type things so that they turn out correctly on screen.
>
> In short, I have done nothing to help you solve your problem, but I 
> hope you now understand it better :-)
>
> Shachar

Hi Shachar,

That actually pretty much answers my question. That confirms my 
suspicion that it is Word and not FriBidi what is doing it the wrong way.

Now, I only have to convince my users :/

However, I'm not sure if I fully understand why Word's method is more 
error prone than the standard (other than relying on external data that 
would be lost when the string is copied from Word to another application).

I have another question, because I didn't know the bidi tool in 
unicode.org. The paragraph direction seems to have no effect, to make 
the paragraph RTL I have to add a RLE mark at the beginning. Am I 
correct or am I doing something wrong ?

Nevertheless, your answer has been very helpful

Thanks a lot!

Edu.



More information about the fribidi mailing list