[Poppler-bugs] [Bug 17321] New: Incorrect extaction of /ToUnicode CMaps for ligatures.
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Aug 27 02:47:54 PDT 2008
http://bugs.freedesktop.org/show_bug.cgi?id=17321
Summary: Incorrect extaction of /ToUnicode CMaps for ligatures.
Product: poppler
Version: unspecified
Platform: Other
OS/Version: All
Status: NEW
Severity: normal
Priority: medium
Component: general
AssignedTo: poppler-bugs at lists.freedesktop.org
ReportedBy: gaburici at cs.umd.edu
Created an attachment (id=18540)
--> (http://bugs.freedesktop.org/attachment.cgi?id=18540)
Test document.
Attached is a PDF with a CMap that mapps T to A and the ligature Th to Ah (as
separate characters). If you search it with Acrobat for "A", you find, as you'd
expect, two As. If you search it with evince, or extract the text with
pdftotext (from poppler-utils), you only the 2nd CMapped A, the first one is
extracted, incorrectly, as T.
This example is contrived for the sake of keeping it simple and restricted to
English letters, but there are good reasons to want ligatures in CMap work
properly in poppler, as they do in Acrobat.
--
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Poppler-bugs
mailing list