[Poppler-bugs] [Bug 85196] Huge spike in CPU and memory usage by tracker extractor due to rogue file

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Oct 20 05:27:46 PDT 2014


https://bugs.freedesktop.org/show_bug.cgi?id=85196

--- Comment #3 from Martyn Russell <martyn at lanedo.com> ---
(In reply to Adrian Johnson from comment #2)
> The PDF is drawing the dots in the chart with the unicode character U+22C5
> DOT OPERATOR. If you have enough memory and patience the file will be
> successfully processed. On my machine it takes 202 seconds and has peak
> memory usage of 2.7GB. The output file contains over 100,000 U+22C5
> characters.

Yea, still, for a 2Mb file, that's rather a lot of memory use to draw 100k
characters. The speed is also the reason Tracker will SIGABRT on this file,
that's way too long to extract some text from a PDF - arguably, there is none
anyway :)

Is there another API we could use that is more efficient OR to detect if there
is even any content to extract in the first place to avoid this problem?

> I recall a discussion a few years ago about improving the efficiency of the
> text extraction: 
> 
> http://lists.freedesktop.org/archives/poppler/2010-November/006646.html
> 
> I'm not sure what happened to those patches.

This is clearly a problem extending past Tracker if Evince is using poppler
too.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20141020/9ac00e09/attachment-0001.html>


More information about the Poppler-bugs mailing list