[Poppler-bugs] [Bug 104085] New: rendering pdf and pdftotext give different results
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Mon Dec 4 19:56:25 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=104085
Bug ID: 104085
Summary: rendering pdf and pdftotext give different results
Product: poppler
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: utils
Assignee: poppler-bugs at lists.freedesktop.org
Reporter: galtgendo at o2.pl
...well, of course they do - one renders pdf graphicallly the other just tries
to extract the text...
However, the issue is this: I've stumbled upon a pdf file, that's displayed
correctly, but pdftotext was dumping strings, that looked like typos, if not
for the "typo" being the same char.
So, I've looked into the content.
699 0 obj
<<
/BaseEncoding /WinAnsiEncoding
/Differences [
1
/zdot
/aogonek
/eogonek
/sacute
/cacute
/Sacute
/nacute
/Zdot
/zacute
/Zacute
]
/Type /Encoding
>>
endobj
700 0 obj
<<
/Ascent 625
/CapHeight 625
/Descent -177
/Flags 4
/FontBBox [
5
-177
638
877
]
/FontFile2 712 0 R
/FontName /RDZRPI+TimesNewRoman
/ItalicAngle 0
/MissingWidth 777
/StemV 95
/Type /FontDescriptor
>>
endobj
701 0 obj
<<
/Length 702 0 R
>>
stream
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CMapType 2 def
/CMapName/R706 def
1 begincodespacerange
<00><ff>
endcodespacerange
10 beginbfrange
<01><01><015c>
<02><02><0105>
<03><03><0119>
<04><04><015b>
<05><05><0107>
<06><06><015a>
<07><07><0144>
<08><08><015b>
<09><09><017a>
<0a><0a><0179>
endbfrange
endcmap
CMapName currentdict /CMap defineresource pop
end end
endstream
endobj
Well, that's just one of a few such sets. The point is that - for example -
'\zdot' should be '017c' or at least changing it to that gives proper results
in pdftotext. pdf file modified that way still displays correctly.
So, is there a step that pdftotext is skipping, that it could be doing to get
the proper result or is it something that only works during on-screen rendering
?
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171204/090642e9/attachment.html>
More information about the Poppler-bugs
mailing list