[poppler] Characters with accents not correctly handled

Laurent Aguerreche laurent.aguerreche at free.fr
Sun Aug 19 13:26:40 PDT 2007


Le dimanche 19 août 2007 à 16:56 +0200, Albert Astals Cid a écrit :
> --- Laurent Aguerreche escribió:
> 
> > Hello,
> 
> Hi

Hi,

> 
> > Some time ago, I posted this bug report against
> > Fedora 7 :
> >
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=247393
> > and it seems that nothing happened...
> 
> You don't expect us to have a look at red hat
> bugzilla, don't you? ;-)
> 
> > I am posting to this ML because I qualify this bug
> > as important since it
> > makes pdftotext completely useless with languages
> > using accentuated
> > characters (like french in my case...).
> 
> Good move, but for bugs it's better to use poppler
> bugzilla at bugs.freedesktop.org

Yes, you're completely right! Sorry...  :-/

> > Furthermore, pdftotext is currently used by Tracker
> > ( http://www.gnome.org/projects/tracker/ ) to
> > extract text from a PDF
> > file. Then, Tracker can index contain of the text
> > file which is assumed
> > to be contain of the initial PDF file.
> > Since pdftotext destroys accentuated characters,
> > Tracker do not
> > correctly index words and users cannot find them
> > latter.
> > 
> > In my bug report, you will find a PDF file with
> > accentuated characters +
> > a LaTeX file to reproduce another one. I also added
> > what I obtained with
> > pdftotext.
> > 
> >
> > I am not very interesting in installing Poppler
> > 0.6.x (and to be honest
> > I am afraid about what such an install could break
> > on my computer:
> > LaTeX ? Some PDF reader ? etc.) so I would like to
> > know if this bug has
> > been fixed or to point it to Poppler developers
> > otherwise.
> 
> No, it has not been fixed and will not be fixed
> because it is not a bug in poppler. poppler handles
> accentuated characters without any problem, the
> problem you are facing is that the program you are
> using to generate the pdf is not generating the pdf
> "correctly" so that text extraction is possible.
> 
> You can write your very same demonstration text in
> oowriter, export to pdf from inside oowriter and see
> that pdftotext generates a correct output.

Accents are correctly handled, that's right (but spaces are all replaced
with "unbreakable" spaces!).

> You can them open your latex pdf in acrobat reader and
> see it can neither handle the accents correctly.

Hum... That's wrong. My latex-generated PDF is perfectly opened with
acroread, evince, kpdf and xpdf. Why?!

> So blame latex, not poppler.

Ok but if you know the problem, are latex developers aware too? Do you
know whether it is fixable?



Thanks for your answer,
Laurent.

> Albert
> 
> > 
> > 
> > Regards,
> > Laurent Aguerreche.
> > > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> >
> http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> 
> 
> 
>        
> ____________________________________________________________________________________
> Sé un Mejor Amante del Cine                         
> ¿Quieres saber cómo? ¡Deja que otras personas te ayuden!
> http://advision.webevents.yahoo.com/reto/entretenimiento.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://lists.freedesktop.org/archives/poppler/attachments/20070819/36f8565c/attachment.pgp 


More information about the poppler mailing list