<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_NEEDINFO " title="NEEDINFO - Improper text extraction from this pdf" href="https://bugs.freedesktop.org/show_bug.cgi?id=96932#c3">Comment # 3</a> on <a class="bz_bug_link bz_status_NEEDINFO " title="NEEDINFO - Improper text extraction from this pdf" href="https://bugs.freedesktop.org/show_bug.cgi?id=96932">bug 96932</a> from <a class="email" href="mailto:jason@aquaticape.us" title="Jason Crain <jason@aquaticape.us>"> Jason Crain</a> <pre>I doubt that anyone is intentionally trying to hide information. It's just that PDF is primarily a display format and unless the PDF creator does the extra work to include some encoding tables and dictionaries, it's easy to create a PDF that displays the correct glyphs, but can't be converted to text. I haven't taken a close look at this PDF, but if other viewers are also not able to extract the text, it's a good sign that the PDF was made without support for text extraction. There are heuristics in poppler that try to deal with that situation by guessing what the characters should be, but it's never going to be completely accurate.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>