[poppler] Poppler data structures

Andrea Canciani ranma42 at gmail.com
Tue Nov 2 06:35:26 PDT 2010


On Tue, Nov 2, 2010 at 12:56 PM, Andrea Canciani <ranma42 at gmail.com> wrote:
> While trying to solve https://bugs.freedesktop.org/show_bug.cgi?id=30892
> I found out that poppler is not using std data structures as extensively as it
> could.
>
> In particular the bug seems to be triggered by the very slow loop
> http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.cc#n591
> which has a worst case O(n^2) time (that can actually happen if characters
> are randomly placed on the page as in the pdf attached to the bugreport).
>
> The "obvious" solution would be to use the std data structures (std::set would
> allow the word to be added in O(lg n), so the time would reduce to O(n lg n)),
> but this is not trivial because the word list is not just used as a
> data structure,
> but it is often directly accessed/iterated.
>
> So I've started looking for easier changes that would be possible (and would
> cleanup/reduce the code).
>
> I think that GooVector and GooList would be very easy to remove.
>
> Is there any interest in these patches?

0001 is an example of what removing GooVector would mean.

>
> Could somebody give me some hints about what would be the best way to
> fix TextOutputDev.cc?
> (The changes involved seems to be very big, so maybe I should start overlaying
> the set structure on the list and using it only for inserting/searching and try
> to remove the list only after getting this right)

I tried this approach and it seems to fix the problem (patches 0002 and 0003).
It would also make it possible to remove most bookkeeping code from TextPool
(and maybe some other text classes, too).
Should I continue working in this direction?

>
> Suggestions, comments and "don't do that, we won't merge it" are all appreciated
> (expecially if they can save me some work/time).

Patch 0001 might be ready for master, but it is not clear to me if it
is ok (apparently
using std is now approved, but there might be some issues removing Goo classes
because it would make it harder to port changes from xpdf).

Patches 0002 and 0003 are probably not yet ready for master, but I would like to
get some feedback before doing big changes to TextOutputDev.

Andrea
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Remove-GooVector.patch
Type: application/octet-stream
Size: 18307 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20101102/98e435dd/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Remove-cursor-from-TextPool.patch
Type: application/octet-stream
Size: 2086 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20101102/98e435dd/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Fix-bug-30893.patch
Type: application/octet-stream
Size: 2404 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20101102/98e435dd/attachment-0005.obj>


More information about the poppler mailing list