<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Huge spike in CPU and memory usage by tracker extractor due to rogue file" href="https://bugs.freedesktop.org/show_bug.cgi?id=85196#c3">Comment # 3</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Huge spike in CPU and memory usage by tracker extractor due to rogue file" href="https://bugs.freedesktop.org/show_bug.cgi?id=85196">bug 85196</a> from <a class="email" href="mailto:martyn@lanedo.com" title="Martyn Russell <martyn@lanedo.com>"> Martyn Russell</a> <pre>(In reply to Adrian Johnson from <a href="show_bug.cgi?id=85196#c2">comment #2</a>) > The PDF is drawing the dots in the chart with the unicode character U+22C5 > DOT OPERATOR. If you have enough memory and patience the file will be > successfully processed. On my machine it takes 202 seconds and has peak > memory usage of 2.7GB. The output file contains over 100,000 U+22C5 > characters. Yea, still, for a 2Mb file, that's rather a lot of memory use to draw 100k characters. The speed is also the reason Tracker will SIGABRT on this file, that's way too long to extract some text from a PDF - arguably, there is none anyway :) Is there another API we could use that is more efficient OR to detect if there is even any content to extract in the first place to avoid this problem? > I recall a discussion a few years ago about improving the efficiency of the > text extraction: > > <a href="http://lists.freedesktop.org/archives/poppler/2010-November/006646.html">http://lists.freedesktop.org/archives/poppler/2010-November/006646.html</a> > > I'm not sure what happened to those patches. This is clearly a problem extending past Tracker if Evince is using poppler too.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>