<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEEDINFO "
title="NEEDINFO - Improper text extraction from this pdf"
href="https://bugs.freedesktop.org/show_bug.cgi?id=96932#c3">Comment # 3</a>
on <a class="bz_bug_link
bz_status_NEEDINFO "
title="NEEDINFO - Improper text extraction from this pdf"
href="https://bugs.freedesktop.org/show_bug.cgi?id=96932">bug 96932</a>
from <span class="vcard"><a class="email" href="mailto:jason@aquaticape.us" title="Jason Crain <jason@aquaticape.us>"> <span class="fn">Jason Crain</span></a>
</span></b>
<pre>I doubt that anyone is intentionally trying to hide information. It's just
that PDF is primarily a display format and unless the PDF creator does the
extra work to include some encoding tables and dictionaries, it's easy to
create a PDF that displays the correct glyphs, but can't be converted to text.
I haven't taken a close look at this PDF, but if other viewers are also not
able to extract the text, it's a good sign that the PDF was made without
support for text extraction. There are heuristics in poppler that try to deal
with that situation by guessing what the characters should be, but it's never
going to be completely accurate.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>