[poppler] Characters with accents not correctly handled

Carl Worth cworth at cworth.org
Mon Aug 20 11:05:19 PDT 2007


On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote:
> But the real problem is that it is impossible to recognize :
> - "fi" as "fi" too
> - "ff" as "ff" too
> Would it be possible to add a new parameter to pdftotext to make it
> ignore ligatures but still export in UTF-8?

It's quite preferable to have the ligatures in your PDF file.

The bug to fix is that poppler should expand the ligatures to their
normalized forms when extracting the text.

That bug was first reported here:

	Text extraction should expand ligatures to their normal form
	https://bugs.freedesktop.org/show_bug.cgi?id=7002

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20070820/4ff50428/attachment-0001.pgp 


More information about the poppler mailing list