[Fribidi-discuss] Bugfix patches: --debug and remove_marks

Sun May 25 07:07:02 EST 2003

Behdad Esfahbod wrote on 2003-05-24:

> On Fri, 23 May 2003, Beni Cherniavsky wrote:
>
> > While debugging a wrapper script [1]_ around fribidi for creating
> > examples of bidi in reStructureText [2]_ documents (I'll post one soon
> > ;-), I bumped into a series of bugs in fribidi.
>
> Or features ;-).

The levels thing was actually a bug in the library itself.

> The fact behind is that fribidi command line tool is for test
> purposes, so I never spent too much time on it.

I'm aware of that.  However there *is* a need for a robust program
that can show how bidi works and that includes displaying the levels;
without levels it's hard to see how precisely the text was reordered
and is it right for the right reasons ;-).

> Applications would link to the library, or just need the visual
> output of the command line tool.  Many of its advanced feature don't
> work properly with eachother.  I will rewrite it in a more robust
> way some time in the future.
>
I'm ready to work on it but I think it makes sense to first get the
new API done.  I'll send another mail with some API ideas.

> > The script displayed embedding --levels and used --clean (because UAX9
> > doesn't define levels of explicit bidi codes, so I wanted to avoid
> > them in examples).  The levels were obviously wrong, e.g.:
> >
> >[snip]
>
> Hummm, you are right.  Never needed them to work!
>
> > Turns out the ltov is wrong too.  It seems that the intention in
> > `fribidi_remove_bidi_marks()` was to map to the original positions
> > (which makes sense) so I concluded that the ltov and vtol should be of
> > different lengths:
> >
> > diff -ru -w ../fribidi-cvs/fribidi.c ./fribidi.c
> > --- ../fribidi-cvs/fribidi.c	2003-05-23 13:53:45.000000000 +0300
> > +++ ./fribidi.c	2003-05-23 16:12:01.000000000 +0300
> > @@ -1084,6 +1084,11 @@
> >  /*======================================================================
> >   *  fribidi_remove_bidi_marks() removes bidirectional marks, and returns
> >   *  the new length, updates each of other inputs if not NULL.
> > + *
> > + *  Note that position_from_this_list might become shorter but refers to
> > + *  the orginal positions and position_to_this_list remains of the same
> > + *  length but contains -1 values at positions of removed marks.
> >   *----------------------------------------------------------------------*/
> >  FRIBIDI_API FriBidiStrIndex
> >  fribidi_remove_bidi_marks (FriBidiEnv *fribidienv,

I forgot: the comment should also be updated in fribidi.h

> >[snip rest of patch]
> >
> > This solves the ltov:
> >
> > $ fribidi --caprtl --showinput --nopad --levels --ltov --vtol
> > ABC_>def   => _>defCBA
> > 6 5 4 0 1 2 3
> > 3 4 5 6 2 1 0
> > 1 1 1 2 2 2 2
> > $ fribidi --caprtl --showinput --nopad --levels --ltov --vtol --clean
> > ABC_>def   => defCBA
> > 5 4 3 -1 0 1 2
> > 4 5 6 2 1 0
> > 1 1 2 2 2 2
> >
> > But not the levels.  A short investigation shows that the deleted
> > level is in position 0 - which is the visual position of the LRM,
> > whereas the levels list is in logical order!  So I'm not sure how to
> > solve this.
>
> I will find some workaround in the future.  I need some feedback
> that how people need this thing to work.

I tried now to hack a "fix" into `fribidi_remove_bidi_marks()` to work
with levels in the other (logical vs. visual) order.  First, it
becomes very convoluted.  Second, it can't work in principle if the
function is called with non-NULL levels but NULL from_this/to_this
lists.  So I think the right thing to do is to leave this function as
it is -- it should process the given levels in same order as the
string it's given and call it separately on the logical string in
fribidi_main.c:

--- fribidi_main.c.prev 2003-05-25 15:24:46.000000000 +0300
+++ fribidi_main.c      2003-05-25 16:57:30.000000000 +0300
@@ -428,11 +428,13 @@

                   new_len = len;

-                  /* Remove explicit marks, if asked for. */
+                  /* Remove explicit marks in visual string, if asked for.
+                     Levels are in logical order and are cleaned later if
+                     needed. */
                   if (do_clean)
                     len =
                       fribidi_remove_bidi_marks (NULL, visual, len, ltov,
-                                                 vtol, levels);
+                                                 vtol, NULL);

                   if (show_visual)
                     {
@@ -532,6 +534,22 @@
                     {
                       FriBidiStrIndex i;

+                      /* Remove levels of bidi marks. This changes the
+                         string, which might still be needed (e.g. for
+                         show_changes), so we work on a copy... */
+                      if (do_clean)
+                        {
+                          FriBidiChar *clean_logical =
+                            ALLOCATE (NULL, FriBidiChar, orig_len);
+
+                          for (i = 0; i < orig_len; i++)
+                            clean_logical[i] = logical[i];
+                          fribidi_remove_bidi_marks (NULL, clean_logical,
+                                                     orig_len, NULL, NULL,
+                                                     levels);
+                          fribidi_free(NULL, clean_logical);
+                        }
+
                       printf (nl_found);
                       for (i = 0; i < len; i++)
                         printf ("%d ", (int) levels[i]);

Works for me.

> If there is no real life use, ...
>
There is!  Levels are mainly intersting with --clean (because the rest
is not specified by UAX9).  So let's show them correctly.

> > Also, when I started investigating, I first rebuilt with configure
> > --enable-debug.  This didn't compile!  A debug-only funtion was
> > missing the fribidienv parameter:
>
> This one is not mine, the fribidi-env patch is by Omer Za :).
>
Is the fact that the debug build doesn't compile a good or a bad sign?
;-)

> >  /* Here, only for test porpuses, we have assumed that a fribidi_string
> >     ends with a 0 character */

Should I fix that assumption too, BTW?  Sounds like it will subtly
break one day.

> > Another problem was with ``make check``.  It blatantly failed because
> > the output of fribidi --showinput depends on the size of the screen.
> > ``COLUMNS=80 make check`` passed.  This is quite ridiculous.
>
> Fixed by setting COLUMNS to 80 in run.tests.
>
That's fixing the symptoms.  I don't object to the two-column format
for human consumption but I don't see a reason to use --showinput in
the tests at all.  However, there remains one more COLUMNS dependency:
padding for RTL lines.  So I'd also add --nopad (and --basedir) for
more robust testing...

> > I suggest to make --showinput display in on another line, which
> > would be much more robust.

For parsing by other programs - but they don't need ``--showinput`` if
they sent it in ;-).

> > I also think the tests should include more than the output, at
> > least also the levels (then these --clean issues would not go
> > unnoticed).
>
> Infact the levels are not part of the standard,

They are - the moment you expose something called "levels" in your API
and do not write in big letters that it's *not* supposed to match the
"levels" described in the standard algorithm ;-).

> so I prefer not to check them for conformance, but for regression
> testing it is a good idea.

Exactly, after all, they *are* supposed to match the standard.  After
you remove the bidi marks, that is, because UAX9 specifically doesn't
define their level (that's why I'm using --levels with --clean).

> All ideas would go in the next version, with a major redesign.
>
Agreed.  I'll write some redesign ideas separately.  But let's make a
bugfix release of this version.

> > I also think each line could be prefixed by its type and matching
> > chars properly aligned, somewhat like my script currently does::
>
> I just didn't changed the old style.  The current style is more
> readable for most users.
>
> > :Logical:  ``THE <LRE>file "SOME name" wasn't<PDF> FOUND.``
> > :Base dir: R
> > :Levels:   ``1111     12222223333222222222222     2211111``
> > :Visual:   ``.DNUOF file "EMOS name" wasn't EHT``
> >
> > (here you see again that the levels are wrong)
>
> Why?
>
Why wrong? The levels change at the wrong points, e.g. between f and i
in ``file``.

Why my is more readable?  Because it must be aligned with the input
for humans to understand what's happening.

Why prefixed?  Because that might be easier to parse from a program
calling us to do the dirty work.  I've just read the excellent `The
Art Of Unix Programming`__ draft so I'm now especially sensitive to
this ;-).

__ http://www.catb.org/~esr/writings/taoup/

What format exactly should it take?  Doesn't matter much but not
exactly as I showed (that's tuned for pretty reStructuredText output).

> > The ltov/vtol would frequently not fit into a single char; perhaps
> > base 36 should be employed...
>
> This is why I put spaces between them.
>
Perhaps the command-line fribidi should continue working roughly as it
does now and I can polish my script until it's ready to be included
with fribidi as a human-friendly util.

> > BTW, `char_from_level_array` allows for only 16 levels,
> > is this an artifact of lower limits in early TR9 versions?  I also saw
> > a refernce to a maximum of 15 in
> > <http://www.w3.org/TR/REC-CSS2/visuren.html#direction>.
>
> This is because all bugs practically have test cases with less
> than 16 levels!  I simply didn't need more to debug.
>
Got it!  But I wonder at the above CSS2 link - was there any time when
TR9 specified a smaller number of levels?

And while we are at number of levels, the few 63 constants should IMHO
be replaced by some new FRIBIDI_MAX_LEVEL define...

-- 
Beni Cherniavsky <cben at users.sf.net>