[Poppler-bugs] [Bug 37900] New: pdftotext -htmlmeta and pdftohtml fail to decode U+2019
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Jun 3 16:17:17 PDT 2011
https://bugs.freedesktop.org/show_bug.cgi?id=37900
Summary: pdftotext -htmlmeta and pdftohtml fail to decode
U+2019
Product: poppler
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: medium
Component: general
AssignedTo: poppler-bugs at lists.freedesktop.org
ReportedBy: sjm217-freedesktop at srcf.ucam.org
Created an attachment (id=47501)
--> (https://bugs.freedesktop.org/attachment.cgi?id=47501)
Fix pdftotext -htmlmeta to correctly output U+2019 in PDF metadata
pdftotext -htmlmeta is supposed to parse the PDF metadata and output it as HTML
metadata. It generally works, but fails when decoding U+2019 (right single
quotation mark).
This is because U+2019 may be encoded in PDF documents as 0x90, because the PDF
document encoding uses some of the reserved areas of ISO 8859-1. pdfinfo does
the right thing, so I have attached a patch which makes pdftotext use the same
approach as pdfinfo. pdftohtml has the same problem, but I haven't attempted to
fix this.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Poppler-bugs
mailing list