[poppler] A commit in git doubled rendering time for some PDFs

Dennis Sheil dennis-poppler at vartmp.com
Fri Oct 22 23:37:28 PDT 2010


I've been looking at a commit that more than doubled rendering speed
for some PDFs.  The commit is from November 2009, but it just hit the
mainstream vanilla stable Ubuntu platform with the release of Ubuntu
10.10 last week (the commit is not to be found in the Debian unstable
poppler package as of yet).  The commit is
f83b677a8eb44d65698b77edb13a5c7de3a72c0f .  The purpose of the commit
was to fix the order of text selection in column displays.

I tested the speed of rendering on three poppler commits - the latest
one (c64a49307782299cb7a950a66419f9d59707f38b), the troubling one
(f83b677a8eb44d65698b77edb13a5c7de3a72c0f) and two commits prior to
the troubling one (345ed51af9b9e7ea53af42727b91ed68dcc52370).  I first
tested rendering on the PDF on which I noticed the problem, which is a
PDF which shows bus routes which are local to me (
http://www.mta.info/nyct/maps/busqns.pdf ).  For the latest commit
(c64a...) and the troubling one (f83b...) rendering time was more than
twice that of the commit two prior to the troubling one (34e...).  On
my desktop, which has a 2.8 Ghz dual core CPU, the post-commit
rendering took over 8 seconds, the pre-commit rendering was less than
3 seconds.

I wanted to test another PDF, so I randomly searched Google for map
PDFs and grabbed another PDF to test -
http://www.ratp.info/picts/touristes/photos/plan%20paris-touriste.pdf
.  Rendering on the pre-f83b commit was 50% faster than from the f83b
commit and the latest (c64a...) commit.  So this is not just on that
one PDF.

Another thing I did was from the latest (c64a...) git commit I
rendered against the latest code, and also rendered against that
latest commit, but with the call to the TextBlock class's
visitDepthFirst method from the TextPage class's coalesce method
commented out.  I got similar results to above - with the method
uncommented, rendering time increased from 50% to over 100%.

I mentioned this on Freenode IRC channel #poppler.  Someone, I believe
it was Andrea Canciani, looked at the visitDepthFirst method and said
the two nested loops looked wrong.  I put a counter in
visitDepthFirst's inner for loop to see how many times it ran on my
one-page 751K PDF.  The inner for loop ran over 196 million times!
That's 32 times for every bit my PDF file has.  From within the inner
for loop, if the TextBlock data structure blk3 is not equal to either
blk1 or blk2, it will make not one, but two calls to the
isBeforeByRule1 method.  I didn't count how many times that hit, but
profiling showed isBeforeByRule1 was called quite a lot.  No wonder my
PDFs are rendering slow.

Anyhow, as I said, that f83b... commit is said to have fixed sort
order for text selected from columns.  Although it also more than
doubling the render speed of some of these PDFs, in which there is no
need for this topological sorting.  So that would be my problem
statement.  Next step is the solution - keeping the old render time
(or at least something near it, in cases where topological sorting is
not important) but still keeping the fix that orders selection of text
from columns properly.

If I have the time, I will do more work on this problem.  I am
currently unsure of whether or not I will be able to find the time to
do more work on it though.

-- Dennis


More information about the poppler mailing list