[poppler] Multicolumn select

Baz brian.ewins at gmail.com
Sun Nov 15 17:12:20 PST 2009


2009/11/15 Albert Astals Cid <aacid at kde.org>:
> A Divendres, 13 de novembre de 2009, Baz va escriure:
>> Hi,
>> I uploaded a new version of my multicolumn select patches to
>> https://bugs.freedesktop.org/show_bug.cgi?id=3188 this morning, as you
>> might've seen. This version uses a similar algorithm to ocropus to
>> determine reading order, and tries to make the selection follow this
>> reading order. Its looking fairly good now I think - for all but one
>> of the documents I tested with it picked a reasonable order, and
>> selection doesn't jump all over the place. Of course, I've only tested
>> on the handful of docs that were in the bug reports so I might've made
>> things worse elsewhere :(
>>
>> I was wondering what I can do to get these patches into an acceptable
>> state. There's some obvious issues still to iron out, eg RTL (see
>> http://bugs.kde.org/show_bug.cgi?id=156380 ,
>> http://bugs.kde.org/show_bug.cgi?id=184399) and handling blocks with
>> non-zero rotation; also the new depth_first_visit method I added is in
>> the wrong class - should probably be in TextBlock. I'll fix this up.
>>
>> But beyond that, these patches might be problematic because they
>> remove the old selection behaviour. The new behaviour is much better
>> for multicolumn documents, but is likely to be worse at selecting data
>> out of tables, for example. Should the new selection mode introduce
>> new API, so as not to change the current behaviour of Evince &
>> Okular[1]?
>
> What was the [1] supposed to mean here?

Typo. A reference to a footnote about me not using Okular that was
left over when I moved that into the text... ignore.

> As Carlos said we use a Okular coded
> algorithm for text selection so i'm not sure should affect us much, on the
> other hand we still have the same problem with columns so if this work we
> should probably apply a similar solution to okular.

Ok.

>
>> In older versions of acrobat, they had 'table select' and
>> 'text select' modes, covering these two uses, but more recently table
>> select has been dropped entirely. I suspect that they now just follow
>> the tags in tagged pdf, with the fallback behaviour being something
>> like what I've coded up here.
>>
>> Also, testing. At the moment, testing for me consists of opening a
>> bunch of documents in Evince and selecting stuff randomly (I don't
>> have Okular, but since they use the same API for text selection I
>> presume the bug is the same). I have no idea if I'm introducing
>> regressions. Is there a plan to integrate the unit test framework that
>> was discussed previously?
>> http://lists.freedesktop.org/archives/poppler/2009-March/004535.html .
>
> The Qt4 frontend already has unittests, these are unit tests for glib
> frontend, not a test suite that is what you want.
>

Ok, I see that now. The qt4 tests refer to documents that aren't in git though?

>> Or failing that, is there a pool somewhere of test documents for
>> poppler/evince/okular?
>
> bugzilla is full of them, as said i have around 850 here.

I have the dozen or so that were attached to the bugs for copy/paste,
rtl selection, rtl search. There would have been about double that
originally, but the links have died over time. But this stuff is just
random. Lots of docs just repeating the same bug, docs with lots of
bugs, hundred page documents with a bug on one page, etc. Its
particularly ground truth, or expected behaviour, thats missing from a
collection like that.

>
>> Particularly if someone has docs with rotated
>> blocks, and an RTL doc to test; neither the RTL selection or search
>> bugs had docs attached; also vertical text I guess.
>
> KDE bugzilla has some bugs about RTL, see
> http://bugsfiles.kde.org/attachment.cgi?id=25860
> http://www.shaham.moag.gov.il/Pages_Files/619756lng64807.pdf

That's a useful example, thanks. I'd been through a whole bunch of the
ubuntu/kde/gnome/fdo bugs and only found one other usable RTL doc, but
this one is much better, it has multicolumn stuff too.

Cheers,
Baz
>
> Albert
>
>>
>> Cheers,
>> Baz
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>


More information about the poppler mailing list