[Fribidi-discuss] Bugfix patches: --debug and remove_marks
Beni Cherniavsky
cben at techunix.technion.ac.il
Sun May 25 07:07:02 EST 2003
Behdad Esfahbod wrote on 2003-05-24:
> On Fri, 23 May 2003, Beni Cherniavsky wrote:
>
> > While debugging a wrapper script [1]_ around fribidi for creating
> > examples of bidi in reStructureText [2]_ documents (I'll post one soon
> > ;-), I bumped into a series of bugs in fribidi.
>
> Or features ;-).
The levels thing was actually a bug in the library itself.
> The fact behind is that fribidi command line tool is for test
> purposes, so I never spent too much time on it.
I'm aware of that. However there *is* a need for a robust program
that can show how bidi works and that includes displaying the levels;
without levels it's hard to see how precisely the text was reordered
and is it right for the right reasons ;-).
> Applications would link to the library, or just need the visual
> output of the command line tool. Many of its advanced feature don't
> work properly with eachother. I will rewrite it in a more robust
> way some time in the future.
>
I'm ready to work on it but I think it makes sense to first get the
new API done. I'll send another mail with some API ideas.
> > The script displayed embedding --levels and used --clean (because UAX9
> > doesn't define levels of explicit bidi codes, so I wanted to avoid
> > them in examples). The levels were obviously wrong, e.g.:
> >
> >[snip]
>
> Hummm, you are right. Never needed them to work!
>
> > Turns out the ltov is wrong too. It seems that the intention in
> > `fribidi_remove_bidi_marks()` was to map to the original positions
> > (which makes sense) so I concluded that the ltov and vtol should be of
> > different lengths:
> >
> > diff -ru -w ../fribidi-cvs/fribidi.c ./fribidi.c
> > --- ../fribidi-cvs/fribidi.c 2003-05-23 13:53:45.000000000 +0300
> > +++ ./fribidi.c 2003-05-23 16:12:01.000000000 +0300
> > @@ -1084,6 +1084,11 @@
> > /*======================================================================
> > * fribidi_remove_bidi_marks() removes bidirectional marks, and returns
> > * the new length, updates each of other inputs if not NULL.
> > + *
> > + * Note that position_from_this_list might become shorter but refers to
> > + * the orginal positions and position_to_this_list remains of the same
> > + * length but contains -1 values at positions of removed marks.
> > *----------------------------------------------------------------------*/
> > FRIBIDI_API FriBidiStrIndex
> > fribidi_remove_bidi_marks (FriBidiEnv *fribidienv,
I forgot: the comment should also be updated in fribidi.h
> >[snip rest of patch]
> >
> > This solves the ltov:
> >
> > $ fribidi --caprtl --showinput --nopad --levels --ltov --vtol
> > ABC_>def => _>defCBA
> > 6 5 4 0 1 2 3
> > 3 4 5 6 2 1 0
> > 1 1 1 2 2 2 2
> > $ fribidi --caprtl --showinput --nopad --levels --ltov --vtol --clean
> > ABC_>def => defCBA
> > 5 4 3 -1 0 1 2
> > 4 5 6 2 1 0
> > 1 1 2 2 2 2
> >
> > But not the levels. A short investigation shows that the deleted
> > level is in position 0 - which is the visual position of the LRM,
> > whereas the levels list is in logical order! So I'm not sure how to
> > solve this.
>
> I will find some workaround in the future. I need some feedback
> that how people need this thing to work.
I tried now to hack a "fix" into `fribidi_remove_bidi_marks()` to work
with levels in the other (logical vs. visual) order. First, it
becomes very convoluted. Second, it can't work in principle if the
function is called with non-NULL levels but NULL from_this/to_this
lists. So I think the right thing to do is to leave this function as
it is -- it should process the given levels in same order as the
string it's given and call it separately on the logical string in
fribidi_main.c:
--- fribidi_main.c.prev 2003-05-25 15:24:46.000000000 +0300
+++ fribidi_main.c 2003-05-25 16:57:30.000000000 +0300
@@ -428,11 +428,13 @@
new_len = len;
- /* Remove explicit marks, if asked for. */
+ /* Remove explicit marks in visual string, if asked for.
+ Levels are in logical order and are cleaned later if
+ needed. */
if (do_clean)
len =
fribidi_remove_bidi_marks (NULL, visual, len, ltov,
- vtol, levels);
+ vtol, NULL);
if (show_visual)
{
@@ -532,6 +534,22 @@
{
FriBidiStrIndex i;
+ /* Remove levels of bidi marks. This changes the
+ string, which might still be needed (e.g. for
+ show_changes), so we work on a copy... */
+ if (do_clean)
+ {
+ FriBidiChar *clean_logical =
+ ALLOCATE (NULL, FriBidiChar, orig_len);
+
+ for (i = 0; i < orig_len; i++)
+ clean_logical[i] = logical[i];
+ fribidi_remove_bidi_marks (NULL, clean_logical,
+ orig_len, NULL, NULL,
+ levels);
+ fribidi_free(NULL, clean_logical);
+ }
+
printf (nl_found);
for (i = 0; i < len; i++)
printf ("%d ", (int) levels[i]);
Works for me.
> If there is no real life use, ...
>
There is! Levels are mainly intersting with --clean (because the rest
is not specified by UAX9). So let's show them correctly.
> > Also, when I started investigating, I first rebuilt with configure
> > --enable-debug. This didn't compile! A debug-only funtion was
> > missing the fribidienv parameter:
>
> This one is not mine, the fribidi-env patch is by Omer Za :).
>
Is the fact that the debug build doesn't compile a good or a bad sign?
;-)
> > /* Here, only for test porpuses, we have assumed that a fribidi_string
> > ends with a 0 character */
Should I fix that assumption too, BTW? Sounds like it will subtly
break one day.
> > Another problem was with ``make check``. It blatantly failed because
> > the output of fribidi --showinput depends on the size of the screen.
> > ``COLUMNS=80 make check`` passed. This is quite ridiculous.
>
> Fixed by setting COLUMNS to 80 in run.tests.
>
That's fixing the symptoms. I don't object to the two-column format
for human consumption but I don't see a reason to use --showinput in
the tests at all. However, there remains one more COLUMNS dependency:
padding for RTL lines. So I'd also add --nopad (and --basedir) for
more robust testing...
> > I suggest to make --showinput display in on another line, which
> > would be much more robust.
For parsing by other programs - but they don't need ``--showinput`` if
they sent it in ;-).
> > I also think the tests should include more than the output, at
> > least also the levels (then these --clean issues would not go
> > unnoticed).
>
> Infact the levels are not part of the standard,
They are - the moment you expose something called "levels" in your API
and do not write in big letters that it's *not* supposed to match the
"levels" described in the standard algorithm ;-).
> so I prefer not to check them for conformance, but for regression
> testing it is a good idea.
Exactly, after all, they *are* supposed to match the standard. After
you remove the bidi marks, that is, because UAX9 specifically doesn't
define their level (that's why I'm using --levels with --clean).
> All ideas would go in the next version, with a major redesign.
>
Agreed. I'll write some redesign ideas separately. But let's make a
bugfix release of this version.
> > I also think each line could be prefixed by its type and matching
> > chars properly aligned, somewhat like my script currently does::
>
> I just didn't changed the old style. The current style is more
> readable for most users.
>
> > :Logical: ``THE <LRE>file "SOME name" wasn't<PDF> FOUND.``
> > :Base dir: R
> > :Levels: ``1111 12222223333222222222222 2211111``
> > :Visual: ``.DNUOF file "EMOS name" wasn't EHT``
> >
> > (here you see again that the levels are wrong)
>
> Why?
>
Why wrong? The levels change at the wrong points, e.g. between f and i
in ``file``.
Why my is more readable? Because it must be aligned with the input
for humans to understand what's happening.
Why prefixed? Because that might be easier to parse from a program
calling us to do the dirty work. I've just read the excellent `The
Art Of Unix Programming`__ draft so I'm now especially sensitive to
this ;-).
__ http://www.catb.org/~esr/writings/taoup/
What format exactly should it take? Doesn't matter much but not
exactly as I showed (that's tuned for pretty reStructuredText output).
> > The ltov/vtol would frequently not fit into a single char; perhaps
> > base 36 should be employed...
>
> This is why I put spaces between them.
>
Perhaps the command-line fribidi should continue working roughly as it
does now and I can polish my script until it's ready to be included
with fribidi as a human-friendly util.
> > BTW, `char_from_level_array` allows for only 16 levels,
> > is this an artifact of lower limits in early TR9 versions? I also saw
> > a refernce to a maximum of 15 in
> > <http://www.w3.org/TR/REC-CSS2/visuren.html#direction>.
>
> This is because all bugs practically have test cases with less
> than 16 levels! I simply didn't need more to debug.
>
Got it! But I wonder at the above CSS2 link - was there any time when
TR9 specified a smaller number of levels?
And while we are at number of levels, the few 63 constants should IMHO
be replaced by some new FRIBIDI_MAX_LEVEL define...
--
Beni Cherniavsky <cben at users.sf.net>
More information about the FriBidi
mailing list