[Poppler-bugs] [Bug 96932] Improper text extraction from this pdf
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Jul 15 15:55:29 UTC 2016
https://bugs.freedesktop.org/show_bug.cgi?id=96932
--- Comment #3 from Jason Crain <jason at aquaticape.us> ---
I doubt that anyone is intentionally trying to hide information. It's just
that PDF is primarily a display format and unless the PDF creator does the
extra work to include some encoding tables and dictionaries, it's easy to
create a PDF that displays the correct glyphs, but can't be converted to text.
I haven't taken a close look at this PDF, but if other viewers are also not
able to extract the text, it's a good sign that the PDF was made without
support for text extraction. There are heuristics in poppler that try to deal
with that situation by guessing what the characters should be, but it's never
going to be completely accurate.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160715/a6dfcbfd/attachment.html>
More information about the Poppler-bugs
mailing list