[poppler] Multicolumn select

Carlos Garcia Campos carlosgc at gnome.org
Sun Nov 15 04:17:16 PST 2009


Excerpts from Baz's message of vie nov 13 12:56:26 +0100 2009:
> Hi,

Hi Brian, 

> I uploaded a new version of my multicolumn select patches to
> https://bugs.freedesktop.org/show_bug.cgi?id=3188 this morning, as you
> might've seen.

Yes, it's great to know you are working on this again :-) thank you
very much. 

> This version uses a similar algorithm to ocropus to
> determine reading order, and tries to make the selection follow this
> reading order. Its looking fairly good now I think - for all but one
> of the documents I tested with it picked a reasonable order, and
> selection doesn't jump all over the place. Of course, I've only tested
> on the handful of docs that were in the bug reports so I might've made
> things worse elsewhere :(

I've just tried it and I've found some issues, see self-explanatory
screenshots:

http://people.freedesktop.org/~carlosgc/poppler-multi-column-issue1.png
http://people.freedesktop.org/~carlosgc/poppler-multi-column-issue2.png

The line selection (triple-click) seems to be broken too. 

> I was wondering what I can do to get these patches into an acceptable
> state. There's some obvious issues still to iron out, eg RTL (see
> http://bugs.kde.org/show_bug.cgi?id=156380 ,
> http://bugs.kde.org/show_bug.cgi?id=184399) and handling blocks with
> non-zero rotation; also the new depth_first_visit method I added is in
> the wrong class - should probably be in TextBlock. I'll fix this up.

Current behaviour has been broken for a long time, any improvement
even still a bit broken, is very appreciated. 

> But beyond that, these patches might be problematic because they
> remove the old selection behaviour. The new behaviour is much better
> for multicolumn documents, but is likely to be worse at selecting data
> out of tables, for example. Should the new selection mode introduce
> new API, so as not to change the current behaviour of Evince &
> Okular[1]?

Having a new API would definitely make things easier, yes.

> In older versions of acrobat, they had 'table select' and
> 'text select' modes, covering these two uses, but more recently table
> select has been dropped entirely. I suspect that they now just follow
> the tags in tagged pdf, with the fallback behaviour being something
> like what I've coded up here.
> 
> Also, testing. At the moment, testing for me consists of opening a
> bunch of documents in Evince and selecting stuff randomly (I don't
> have Okular, but since they use the same API for text selection I
> presume the bug is the same).

Well, Okular doesn't use TextOutputDev for selecting, but it does for
extracting the text, so it will be affected anyway. 

> I have no idea if I'm introducing
> regressions. Is there a plan to integrate the unit test framework that
> was discussed previously?
> http://lists.freedesktop.org/archives/poppler/2009-March/004535.html
> .

Yes, but I didn't manage to get it working without crashing :-(

> Or failing that, is there a pool somewhere of test documents for
> poppler/evince/okular?

Yes, Albert has a regression test script, so he can run it with your
patches applied. 

> Particularly if someone has docs with rotated
> blocks, and an RTL doc to test; neither the RTL selection or search
> bugs had docs attached; also vertical text I guess.
>
> Cheers,
> Baz

We are closer to fix it, keep up the good work!
-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20091115/c2b129de/attachment.pgp 


More information about the poppler mailing list