[poppler] Poppler data structures

Andrea Canciani ranma42 at gmail.com
Tue Nov 2 04:56:32 PDT 2010


While trying to solve https://bugs.freedesktop.org/show_bug.cgi?id=30892
I found out that poppler is not using std data structures as extensively as it
could.

In particular the bug seems to be triggered by the very slow loop
http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.cc#n591
which has a worst case O(n^2) time (that can actually happen if characters
are randomly placed on the page as in the pdf attached to the bugreport).

The "obvious" solution would be to use the std data structures (std::set would
allow the word to be added in O(lg n), so the time would reduce to O(n lg n)),
but this is not trivial because the word list is not just used as a
data structure,
but it is often directly accessed/iterated.

So I've started looking for easier changes that would be possible (and would
cleanup/reduce the code).

I think that GooVector and GooList would be very easy to remove.

Is there any interest in these patches?

Could somebody give me some hints about what would be the best way to
fix TextOutputDev.cc?
(The changes involved seems to be very big, so maybe I should start overlaying
the set structure on the list and using it only for inserting/searching and try
to remove the list only after getting this right)

Suggestions, comments and "don't do that, we won't merge it" are all appreciated
(expecially if they can save me some work/time).

Andrea


More information about the poppler mailing list