[poppler] Multi-threading rendering on Raspberry Pi

Wed Feb 22 13:48:52 UTC 2017

I can't really get the point:

All threads, in your case two, have to share the PDFDoc, Catalog, 
XRef-table and inputstream, so yes, of course there are a lot of mutex 
locks, especially when a lot of objects are shared between the pages. 
When I developped the multi threading feature it I got a lot of problems 
until I had all locks at the right place, because missing locks caused 
garbage rendering and  program crashes.
The problem here is that neither poppler nor the underlying xpdf was 
designed to use threads at all, so the thread implementation could never 
be such optimal as it would be desirable.

So I guess that in your case a lot of time is needed in parsing the PDF 
objects and not so much in rendering them.

Cheers,
Thomas

Am 22.02.2017 um 04:23 schrieb pqt at LEFerguson.com:
> I have a PDF rendering program for sheet music running on a Raspberry Pi 3, using Poppler 0.51.0 built from source, running in QT5.8 through the QT5 API.
>
> I am seeing some weird threading performance behavior.  I am calling the page->renderToImage within a separate thread, or more precisely several of them.
>
> I am not getting any errors, and the results are correct.
>
> For example, in rendering the same two (and only two) pages in a single thread, it takes 5.7 and 5.6 seconds to render, a total of 11.3 seconds.  When rendered in parallel it takes 8.6 seconds for the first to complete, and an additional 50ms +/- for the second, i.e. basically 8.6 seconds total.
>
> There is no IO that I can see going on at the time, there is no swap file (so no swap usage), plenty of memory, and nothing else running except the desktop services to display the images.
>
> That's faster, but not nearly as much faster as I anticipated.
>
> Three at a time gives about 8 seconds for the first, about 1.5 seconds for the second, and 0.6 for the third (I say "about" as my 3 page render was different content).
>
> Even though no IO occurs, increasing to 4 I still cannot get the processor busy (e.g. as seen by "top"), seeming to imply some constraint beyond cores.
>
> Here's what is more strange.  If I submit 3 pages in a row in order 1, 2, 3 to three separate threads (the Pi3 has 4 cores), these always finish in order 3, 2, 1.  I've instrumented these in as many ways as I can to confirm the sequence (and yes, that they really are running in separate threads). That's not a big deal program-logic wise, but it is an odd symptom.   That aspect is reproducible on a fast HyperV box I use for testing (it processes them fast enough that the rendering speed is not terribly meaningful there) - it is always in reverse order.  And not all that close (i.e. it's not a stream IO issue with the debug output).
>
> Makes me wonder if something is blocking/serialized, forcing the LIFO behavior and so perhaps keeping me from getting the most performance.
>
> Are there any special considerations for using Poppler with multi-threaded rendering?   Different cmake options for example?   Different calling sequences?
>
> I realize that the Pi3 architecture might be causing this, e.g. memory speed so multi-threading is less efficient. I really did not think much about it until I realized the renders (started within a millisecond of each other) always finish in reverse order of initiation.
>
> Incidentally, I have tried compiles (of poppler as well as my application) with both -O2 and -O3 with negligible difference in performance.
>
> Any suggestions or insights would be welcomed.
>
> Linwood Ferguson
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler