[poppler] Multicolumn select

Baz brian.ewins at gmail.com
Fri Nov 13 03:56:26 PST 2009


Hi,
I uploaded a new version of my multicolumn select patches to
https://bugs.freedesktop.org/show_bug.cgi?id=3188 this morning, as you
might've seen. This version uses a similar algorithm to ocropus to
determine reading order, and tries to make the selection follow this
reading order. Its looking fairly good now I think - for all but one
of the documents I tested with it picked a reasonable order, and
selection doesn't jump all over the place. Of course, I've only tested
on the handful of docs that were in the bug reports so I might've made
things worse elsewhere :(

I was wondering what I can do to get these patches into an acceptable
state. There's some obvious issues still to iron out, eg RTL (see
http://bugs.kde.org/show_bug.cgi?id=156380 ,
http://bugs.kde.org/show_bug.cgi?id=184399) and handling blocks with
non-zero rotation; also the new depth_first_visit method I added is in
the wrong class - should probably be in TextBlock. I'll fix this up.

But beyond that, these patches might be problematic because they
remove the old selection behaviour. The new behaviour is much better
for multicolumn documents, but is likely to be worse at selecting data
out of tables, for example. Should the new selection mode introduce
new API, so as not to change the current behaviour of Evince &
Okular[1]? In older versions of acrobat, they had 'table select' and
'text select' modes, covering these two uses, but more recently table
select has been dropped entirely. I suspect that they now just follow
the tags in tagged pdf, with the fallback behaviour being something
like what I've coded up here.

Also, testing. At the moment, testing for me consists of opening a
bunch of documents in Evince and selecting stuff randomly (I don't
have Okular, but since they use the same API for text selection I
presume the bug is the same). I have no idea if I'm introducing
regressions. Is there a plan to integrate the unit test framework that
was discussed previously?
http://lists.freedesktop.org/archives/poppler/2009-March/004535.html .
Or failing that, is there a pool somewhere of test documents for
poppler/evince/okular? Particularly if someone has docs with rotated
blocks, and an RTL doc to test; neither the RTL selection or search
bugs had docs attached; also vertical text I guess.

Cheers,
Baz


More information about the poppler mailing list