[poppler] Characters with accents not correctly handled

Laurent Aguerreche laurent.aguerreche at free.fr
Mon Aug 20 11:31:35 PDT 2007


Le lundi 20 août 2007 à 11:05 -0700, Carl Worth a écrit :
> On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote:
> > But the real problem is that it is impossible to recognize :
> > - "fi" as "fi" too
> > - "ff" as "ff" too
> > Would it be possible to add a new parameter to pdftotext to make it
> > ignore ligatures but still export in UTF-8?
> 
> It's quite preferable to have the ligatures in your PDF file.

When it is correctly rendered it is great!

> The bug to fix is that poppler should expand the ligatures to their
> normalized forms when extracting the text.
> 
> That bug was first reported here:
> 
> 	Text extraction should expand ligatures to their normal form
> 	https://bugs.freedesktop.org/show_bug.cgi?id=7002

Ok.

So as a Tracker point of vue it means that it won't have to convert a
"ff" character as input for search from an user to "ff" ; users will have
to only input "ff".

But on the Poppler side, is bug #7002 close to be fixed?  ;-)


Laurent.

> -Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://lists.freedesktop.org/archives/poppler/attachments/20070820/f64d9d10/attachment.pgp 


More information about the poppler mailing list