[Poppler-bugs] [Bug 104085] rendering pdf and pdftotext give different results

Mon Dec 4 21:59:15 UTC 2017

https://bugs.freedesktop.org/show_bug.cgi?id=104085

--- Comment #3 from Jason Crain <jason at inspiresomeone.us> ---
(In reply to Rafał Mużyło from comment #2)
> Why is it displayed correctly then ?

Because the CMap is only used to look up the Unicode character for text
extraction. Finding the glyph to draw is done using the character code or name.
It might make more sense if you think of PDF as primarily a display format with
text extraction and metadata support added on.

> Yet, is there nothing pdftotext could do in such case ?

I doubt it. It's doing what the PDF tells it to. If you show that Adobe Reader
does it differently then maybe.

> That is, are those two tables only info poppler gets from such pdf file wrt.
> text content ?

No, it's much more complicated. It's detailed in the Text section of the PDF
reference.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171204/0baf2c81/attachment.html>