[FriBidi] Fwd: Visual to Logical Order

Beni Cherniavsky-Paskin cben at users.sf.net
Mon Dec 7 07:12:03 PST 2015


It's funny, we humans run this non-existant algorithm all the time; the
sole purpose of UAX 9 and FriBidi is feeding more inputs to our Vis2Log
algo :-)

Some thoughts:

- Do you know base direction?  Autodetecting it from visual input is harder
than from logical.
  E.g. "he said EMBEDDING" and "HE SAID embedding" both look like "ltr RTL"
visually.
  Percentage might work better than first-strong.
  Dangling punctuation at one end — especially .?! — may be a quite robust
sign, if present.

- If the visual input has any markup/formatting, translating all structure
into FSI..PDI should help.

- Do you have line breaks?  It feels to me the per-line reordering should
complicate reassembly and log2vis can not work well on split lines.
  Though experimenting with "bidiv -w 10" I have trouble finding any
problematic example, so it's probably not severe...
  OK, got one with numbers:

# The first `bidiv -w $w` generates visual data with line splits (can't use
`fribidi --caprtl` because it still wrongly does line wrapping after
reordering).
# The second `bidiv` approximates vis2log; can be replaced with `fribidi
--ltr --nopad` with exactly same results.

$ for w in $(seq 15); do
    echo 'foo bar BAZ 123456789 QUUX they said' | perl -C -pe 'use utf8;
tr/A-Z/א-ש/' |
      bidiv -j -w $w |
      bidiv -j |
      perl -C -pe 'use utf8; tr/א-ש/A-Z/' | tr -d '\n'
    echo "  # -w $w"
done
foo bar BAZ 123456789 QUUX they said  # -w 1
foo bar BAZ 123456789 QUUX they said  # -w 2
foo bar BAZ 123456789 QUUX they said  # -w 3
foo bar BAZ 123456789 QUUX they said  # -w 4
foo bar BA123 Z456789 QUUX they said  # -w 5
foo bar BAZ 123456789 QUUX they said  # -w 6
foo bar 12 BAZ3456789 QUUX they said  # -w 7
foo bar 1234 BAZ56789 QUUX they said  # -w 8
foo bar B123456 AZ789 QUUX they said  # -w 9
foo bar BA12345678 Z9 QUUX they said  # -w 10
foo bar BAZ 123456789 QUUX they said  # -w 11
foo bar BAZ 123456789 QUUX they said  # -w 12
foo bar 1 BAZ23456789 QUUX they said  # -w 13
foo bar 12 BAZ3456789 QUUX they said  # -w 14
foo bar 123 BAZ456789 QUUX they said  # -w 15

As can be expected, joining the lines (`tr -d '\n'`) before the second
"vis2log" bidiv makes things worse e.g. XQUU 456789Z 123BA.
Per-line reordering can make contiguous text non-contiguous and deleting
the break points definitely looses data in way that'd be hard even for
AI-complete humans.

[resending as first try from non-subscribed address was rejected, sorry if
you get it twice]

2015-11-14 20:47 GMT+02:00 Dov Grobgeld <dov.grobgeld at gmail.com>:

> Hi Hossein,
>
> Currently there is no algorithm in fribidi doing visual to logical, and as
> you said in general this inversion cannot be done. A first guess would be
> just to do logical to visual on the visual string.
>
> A better approch would be something similar to the following:
>
> 1. Let's call the algorithm doing L→V convertion V(s), i.e.
> fribidi_log2vis() in fribidi, and v the visual string and s the logical
> string.
> 2. Make an initial guess of the logical string s_0 = V(v).
> 3. While V(s_i) ≠ s: modify s_{i+1}=M(s_i) and try again.
>
> This basically turns the problem into a problem of how to write the
> modification function M(s). This may probably be done by comparing V(s) and
> v and using the differences between them to move misplaces characters to
> the other side of L and R boundaries.
>
> Another idea would be to check out the algorithm in the ICU sources. Or to
> ask the lead ICU author if he could provide some references to the
> heuristics they used.
>
> Regards,
> Dov
>
>
>
>
>
> On Sat, Nov 14, 2015 at 4:18 AM, Hossein Khatoonabadi <
> hkhatoonabadi at pdftron.com> wrote:
>
>> Hi / Dorood,
>>
>> I'm looking for a function that converts a visual context to a logical
>> string using FiriBidi library.
>>
>> I know that there is no standard algorithm for this, and there is no
>> unique answer. However, ICU has a function that approximates logical from
>> visual order. I've read the FiriBidi mailing list regarding this topic
>> (from couple of years ago), but there was no acceptable
>> conclusion/workaround at the time.
>>
>> It has been years now since that discussion and I'm sure many like us are
>> still looking for such a function. Does anyone have any idea?
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/fribidi/attachments/20151207/8b788d27/attachment.html>


More information about the fribidi mailing list