[Poppler-bugs] [Bug 7002] Text extraction should expand ligatures to their normal form

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Feb 20 03:24:37 PST 2012


https://bugs.freedesktop.org/show_bug.cgi?id=7002

--- Comment #9 from Adrian Johnson <ajohnson at redneon.com> 2012-02-20 03:24:37 PST ---
If we were to add a command line option for normalizing unicode it should normalize all of the text like findText() does, not just the characters from one code
path in the glyph to unicode code.

Thinking about this again I agree it is probably not a good idea to unconditionally normalize all glyphs. But outputting "fi" style ligatures causes problems
when searching the text. Maybe it would be better to only normalize glyphs in the Alphabetic Presentation Forms range: U+FB00–U+FB4F since the Unicode
Consortium discourages the use of these presentation forms.

I tried the save as text function of acroread on the second test case and it expanded the ligatures.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list