[Poppler-bugs] [Bug 17321] Incorrect extaction of /ToUnicode CMaps for ligatures.

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Aug 27 07:16:42 PDT 2008


http://bugs.freedesktop.org/show_bug.cgi?id=17321





--- Comment #2 from Vasile Gaburici <gaburici at cs.umd.edu>  2008-08-27 07:16:40 PST ---
I found the bug. The ligatures are being set twice in the sMap array. Once by
poppler's built-in "smart" algorithm, and then by the CMap. According to the
comments in GfxFont.cc, the CMap should take precedence. But it doesn't, not
for ligatures! The bug is that CMap ligatures get added at the end of the sMap
array, but lookup happens linearly from the front! E.g. the Th ligature (char
0) is first set to 00540068 (Th) at index 0 in sMap by the built-in algorithm,
and then to 00410068 at index 1 by the CMap. But lookup returns the entry at
index 0. The fix is to scan the sMap backwards.

$ pdftotext liga-cmap-bug.pdf
Setting @0 00[0] -> 0054
Setting @0 00[1] -> 0068
Adding @1 00: 00410068 + 0
Adding @2 02: 00660066006A + 0
Adding @3 03: 01620068 + 0
Adding @4 0B: 00660066 + 0
Adding @5 0C: 00660069 + 0
Adding @6 0D: 0066006C + 0
Adding @7 0E: 006600660069 + 0
Adding @8 0F: 00660066006C + 0
Adding @9 9C: 0049004A + 0
Adding @10 A0: 0066006A + 0
Adding @11 BC: 0069006A + 0
Returning @0 00[0] -> 0054
Returning @0 00[1] -> 0068


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list