[poppler] Characters with accents not correctly handled

Albert Astals Cid tsdgeos at yahoo.es
Sun Aug 19 07:56:52 PDT 2007


--- Laurent Aguerreche escribió:

> Hello,

Hi

> Some time ago, I posted this bug report against
> Fedora 7 :
>
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=247393
> and it seems that nothing happened...

You don't expect us to have a look at red hat
bugzilla, don't you? ;-)

> I am posting to this ML because I qualify this bug
> as important since it
> makes pdftotext completely useless with languages
> using accentuated
> characters (like french in my case...).

Good move, but for bugs it's better to use poppler
bugzilla at bugs.freedesktop.org

> Furthermore, pdftotext is currently used by Tracker
> ( http://www.gnome.org/projects/tracker/ ) to
> extract text from a PDF
> file. Then, Tracker can index contain of the text
> file which is assumed
> to be contain of the initial PDF file.
> Since pdftotext destroys accentuated characters,
> Tracker do not
> correctly index words and users cannot find them
> latter.
> 
> In my bug report, you will find a PDF file with
> accentuated characters +
> a LaTeX file to reproduce another one. I also added
> what I obtained with
> pdftotext.
> 
>
> I am not very interesting in installing Poppler
> 0.6.x (and to be honest
> I am afraid about what such an install could break
> on my computer:
> LaTeX ? Some PDF reader ? etc.) so I would like to
> know if this bug has
> been fixed or to point it to Poppler developers
> otherwise.

No, it has not been fixed and will not be fixed
because it is not a bug in poppler. poppler handles
accentuated characters without any problem, the
problem you are facing is that the program you are
using to generate the pdf is not generating the pdf
"correctly" so that text extraction is possible.

You can write your very same demonstration text in
oowriter, export to pdf from inside oowriter and see
that pdftotext generates a correct output.

You can them open your latex pdf in acrobat reader and
see it can neither handle the accents correctly.

So blame latex, not poppler.

Albert

> 
> 
> Regards,
> Laurent Aguerreche.
> > _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
>
http://lists.freedesktop.org/mailman/listinfo/poppler
> 



       
____________________________________________________________________________________
Sé un Mejor Amante del Cine                         
¿Quieres saber cómo? ¡Deja que otras personas te ayuden!
http://advision.webevents.yahoo.com/reto/entretenimiento.html


More information about the poppler mailing list