<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Huge spike in CPU and memory usage by tracker extractor due to rogue file"
href="https://bugs.freedesktop.org/show_bug.cgi?id=85196#c3">Comment # 3</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Huge spike in CPU and memory usage by tracker extractor due to rogue file"
href="https://bugs.freedesktop.org/show_bug.cgi?id=85196">bug 85196</a>
from <span class="vcard"><a class="email" href="mailto:martyn@lanedo.com" title="Martyn Russell <martyn@lanedo.com>"> <span class="fn">Martyn Russell</span></a>
</span></b>
<pre>(In reply to Adrian Johnson from <a href="show_bug.cgi?id=85196#c2">comment #2</a>)
<span class="quote">> The PDF is drawing the dots in the chart with the unicode character U+22C5
> DOT OPERATOR. If you have enough memory and patience the file will be
> successfully processed. On my machine it takes 202 seconds and has peak
> memory usage of 2.7GB. The output file contains over 100,000 U+22C5
> characters.</span >
Yea, still, for a 2Mb file, that's rather a lot of memory use to draw 100k
characters. The speed is also the reason Tracker will SIGABRT on this file,
that's way too long to extract some text from a PDF - arguably, there is none
anyway :)
Is there another API we could use that is more efficient OR to detect if there
is even any content to extract in the first place to avoid this problem?
<span class="quote">> I recall a discussion a few years ago about improving the efficiency of the
> text extraction:
>
> <a href="http://lists.freedesktop.org/archives/poppler/2010-November/006646.html">http://lists.freedesktop.org/archives/poppler/2010-November/006646.html</a>
>
> I'm not sure what happened to those patches.</span >
This is clearly a problem extending past Tracker if Evince is using poppler
too.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>