[Poppler-bugs] [Bug 17321] New: Incorrect extaction of /ToUnicode CMaps for ligatures.

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Aug 27 02:47:54 PDT 2008


http://bugs.freedesktop.org/show_bug.cgi?id=17321

           Summary: Incorrect extaction of /ToUnicode CMaps for ligatures.
           Product: poppler
           Version: unspecified
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: general
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: gaburici at cs.umd.edu


Created an attachment (id=18540)
 --> (http://bugs.freedesktop.org/attachment.cgi?id=18540)
Test document.

Attached is a PDF with a CMap that mapps T to A and the ligature Th to Ah (as
separate characters). If you search it with Acrobat for "A", you find, as you'd
expect, two As. If you search it with evince, or extract the text with
pdftotext (from poppler-utils), you only the 2nd CMapped A, the first one is
extracted, incorrectly, as T.

This example is contrived for the sake of keeping it simple and restricted to
English letters, but there are good reasons to want ligatures in CMap work
properly in poppler, as they do in Acrobat.


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list