[poppler] Characters with accents not correctly handled

Albert Astals Cid aacid at kde.org
Tue Aug 21 13:30:30 PDT 2007


A Dilluns 20 Agost 2007, Carl Worth va escriure:
> On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote:
> > But the real problem is that it is impossible to recognize :
> > - "fi" as "fi" too
> > - "ff" as "ff" too
> > Would it be possible to add a new parameter to pdftotext to make it
> > ignore ligatures but still export in UTF-8?
>
> It's quite preferable to have the ligatures in your PDF file.
>
> The bug to fix is that poppler should expand the ligatures to their
> normalized forms when extracting the text.

Actually i disagree, if you have æ do you want to get it expanded to ae too? 
If not why you want it with the ff ligature?

Albert

>
> That bug was first reported here:
>
> 	Text extraction should expand ligatures to their normal form
> 	https://bugs.freedesktop.org/show_bug.cgi?id=7002
>
> -Carl




More information about the poppler mailing list