[poppler] Characters with accents not correctly handled

Laurent Aguerreche laurent.aguerreche at free.fr
Sat Aug 18 03:32:06 PDT 2007


Hello,


Some time ago, I posted this bug report against Fedora 7 :
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=247393
and it seems that nothing happened...

I am posting to this ML because I qualify this bug as important since it
makes pdftotext completely useless with languages using accentuated
characters (like french in my case...).

Furthermore, pdftotext is currently used by Tracker
( http://www.gnome.org/projects/tracker/ ) to extract text from a PDF
file. Then, Tracker can index contain of the text file which is assumed
to be contain of the initial PDF file.
Since pdftotext destroys accentuated characters, Tracker do not
correctly index words and users cannot find them latter.

In my bug report, you will find a PDF file with accentuated characters +
a LaTeX file to reproduce another one. I also added what I obtained with
pdftotext.


I am not very interesting in installing Poppler 0.6.x (and to be honest
I am afraid about what such an install could break on my computer:
LaTeX ? Some PDF reader ? etc.) so I would like to know if this bug has
been fixed or to point it to Poppler developers otherwise.


Regards,
Laurent Aguerreche.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://lists.freedesktop.org/archives/poppler/attachments/20070818/5f01942a/attachment.pgp 


More information about the poppler mailing list