[Poppler-bugs] [Bug 37900] New: pdftotext -htmlmeta and pdftohtml fail to decode U+2019

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Jun 3 16:17:17 PDT 2011


https://bugs.freedesktop.org/show_bug.cgi?id=37900

           Summary: pdftotext -htmlmeta and pdftohtml fail to decode
                    U+2019
           Product: poppler
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: general
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: sjm217-freedesktop at srcf.ucam.org


Created an attachment (id=47501)
 --> (https://bugs.freedesktop.org/attachment.cgi?id=47501)
Fix pdftotext -htmlmeta to correctly output U+2019 in PDF metadata

pdftotext -htmlmeta is supposed to parse the PDF metadata and output it as HTML
metadata. It generally works, but fails when decoding U+2019 (right single
quotation mark).

This is because U+2019 may be encoded in PDF documents as 0x90, because the PDF
document encoding uses some of the reserved areas of ISO 8859-1. pdfinfo does
the right thing, so I have attached a patch which makes pdftotext use the same
approach as pdfinfo. pdftohtml has the same problem, but I haven't attempted to
fix this.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list