[poppler] Multicolumn select

Albert Astals Cid aacid at kde.org
Sun Nov 15 14:37:21 PST 2009


A Divendres, 13 de novembre de 2009, Baz va escriure:
> Hi,
> I uploaded a new version of my multicolumn select patches to
> https://bugs.freedesktop.org/show_bug.cgi?id=3188 this morning, as you
> might've seen. This version uses a similar algorithm to ocropus to
> determine reading order, and tries to make the selection follow this
> reading order. Its looking fairly good now I think - for all but one
> of the documents I tested with it picked a reasonable order, and
> selection doesn't jump all over the place. Of course, I've only tested
> on the handful of docs that were in the bug reports so I might've made
> things worse elsewhere :(
> 
> I was wondering what I can do to get these patches into an acceptable
> state. There's some obvious issues still to iron out, eg RTL (see
> http://bugs.kde.org/show_bug.cgi?id=156380 ,
> http://bugs.kde.org/show_bug.cgi?id=184399) and handling blocks with
> non-zero rotation; also the new depth_first_visit method I added is in
> the wrong class - should probably be in TextBlock. I'll fix this up.
> 
> But beyond that, these patches might be problematic because they
> remove the old selection behaviour. The new behaviour is much better
> for multicolumn documents, but is likely to be worse at selecting data
> out of tables, for example. Should the new selection mode introduce
> new API, so as not to change the current behaviour of Evince &
> Okular[1]?

What was the [1] supposed to mean here? As Carlos said we use a Okular coded 
algorithm for text selection so i'm not sure should affect us much, on the 
other hand we still have the same problem with columns so if this work we 
should probably apply a similar solution to okular.

> In older versions of acrobat, they had 'table select' and
> 'text select' modes, covering these two uses, but more recently table
> select has been dropped entirely. I suspect that they now just follow
> the tags in tagged pdf, with the fallback behaviour being something
> like what I've coded up here.
> 
> Also, testing. At the moment, testing for me consists of opening a
> bunch of documents in Evince and selecting stuff randomly (I don't
> have Okular, but since they use the same API for text selection I
> presume the bug is the same). I have no idea if I'm introducing
> regressions. Is there a plan to integrate the unit test framework that
> was discussed previously?
> http://lists.freedesktop.org/archives/poppler/2009-March/004535.html .

The Qt4 frontend already has unittests, these are unit tests for glib 
frontend, not a test suite that is what you want.

> Or failing that, is there a pool somewhere of test documents for
> poppler/evince/okular? 

bugzilla is full of them, as said i have around 850 here.

> Particularly if someone has docs with rotated
> blocks, and an RTL doc to test; neither the RTL selection or search
> bugs had docs attached; also vertical text I guess.

KDE bugzilla has some bugs about RTL, see 
http://bugsfiles.kde.org/attachment.cgi?id=25860
http://www.shaham.moag.gov.il/Pages_Files/619756lng64807.pdf	

Albert

> 
> Cheers,
> Baz
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list