[poppler] Characters with accents not correctly handled
Carl Worth
cworth at cworth.org
Mon Aug 20 11:05:19 PDT 2007
On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote:
> But the real problem is that it is impossible to recognize :
> - "fi" as "fi" too
> - "ff" as "ff" too
> Would it be possible to add a new parameter to pdftotext to make it
> ignore ligatures but still export in UTF-8?
It's quite preferable to have the ligatures in your PDF file.
The bug to fix is that poppler should expand the ligatures to their
normalized forms when extracting the text.
That bug was first reported here:
Text extraction should expand ligatures to their normal form
https://bugs.freedesktop.org/show_bug.cgi?id=7002
-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20070820/4ff50428/attachment-0001.pgp
More information about the poppler
mailing list