[Poppler-bugs] [Bug 85196] Huge spike in CPU and memory usage by tracker extractor due to rogue file
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Mon Oct 20 05:27:46 PDT 2014
https://bugs.freedesktop.org/show_bug.cgi?id=85196
--- Comment #3 from Martyn Russell <martyn at lanedo.com> ---
(In reply to Adrian Johnson from comment #2)
> The PDF is drawing the dots in the chart with the unicode character U+22C5
> DOT OPERATOR. If you have enough memory and patience the file will be
> successfully processed. On my machine it takes 202 seconds and has peak
> memory usage of 2.7GB. The output file contains over 100,000 U+22C5
> characters.
Yea, still, for a 2Mb file, that's rather a lot of memory use to draw 100k
characters. The speed is also the reason Tracker will SIGABRT on this file,
that's way too long to extract some text from a PDF - arguably, there is none
anyway :)
Is there another API we could use that is more efficient OR to detect if there
is even any content to extract in the first place to avoid this problem?
> I recall a discussion a few years ago about improving the efficiency of the
> text extraction:
>
> http://lists.freedesktop.org/archives/poppler/2010-November/006646.html
>
> I'm not sure what happened to those patches.
This is clearly a problem extending past Tracker if Evince is using poppler
too.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20141020/9ac00e09/attachment-0001.html>
More information about the Poppler-bugs
mailing list