[Poppler-bugs] [Bug 54268] New: problem copy/pasting CID? / Identity-H? text

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Aug 30 07:50:29 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=54268

             Bug #: 54268
           Summary: problem copy/pasting CID? / Identity-H? text
    Classification: Unclassified
           Product: poppler
           Version: unspecified
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: general
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: fpeters at 0d.be


I got a whole lot of PDF files where poppler somehow fails (example at
<http://people.gnome.org/~fpeters/pdf-identity-h-bug.pdf>).

The first page is ok but then it got a second page attached, with a single
word, in a monospace font (looking in document properties in poppler it's
"FreeMono, Truetype (CID), encoded as Identity-H"). That word is displayed
correctly but converted to something entirely different when copy/pasting from
evince, or using the pdftotext or pdftohtml entities.

The displayed word is "tapiraient" while the word extracted as text is
"WDSLUDLHQW". In the serie of documents I have, other examples give:

  DQJRLVVHUD -> angoissera
  HQDPRXUHU -> enamourer
  FRQWUHFDUUDLW -> contrecarrait

It looks like the mapping is always the same, and letters are kept in the same
order (ex: D->a, E->?, F->c, G->?, H->e...); I checked poppler-data and there
is CMap/Identity-H but I couldn't figure if it's used, or relevant.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list