<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - rendering pdf and pdftotext give different results"
href="https://bugs.freedesktop.org/show_bug.cgi?id=104085#c3">Comment # 3</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - rendering pdf and pdftotext give different results"
href="https://bugs.freedesktop.org/show_bug.cgi?id=104085">bug 104085</a>
from <span class="vcard"><a class="email" href="mailto:jason@inspiresomeone.us" title="Jason Crain <jason@inspiresomeone.us>"> <span class="fn">Jason Crain</span></a>
</span></b>
<pre>(In reply to Rafał Mużyło from <a href="show_bug.cgi?id=104085#c2">comment #2</a>)
<span class="quote">> Why is it displayed correctly then ?</span >
Because the CMap is only used to look up the Unicode character for text
extraction. Finding the glyph to draw is done using the character code or name.
It might make more sense if you think of PDF as primarily a display format with
text extraction and metadata support added on.
<span class="quote">> Yet, is there nothing pdftotext could do in such case ?</span >
I doubt it. It's doing what the PDF tells it to. If you show that Adobe Reader
does it differently then maybe.
<span class="quote">> That is, are those two tables only info poppler gets from such pdf file wrt.
> text content ?</span >
No, it's much more complicated. It's detailed in the Text section of the PDF
reference.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>