[poppler] No text extracted by pdftohtml
Albert Astals Cid
aacid at kde.org
Wed May 26 12:17:34 PDT 2010
A Diumenge, 9 de maig de 2010, Jaime Gómez Obregón va escriure:
> Hi everybody,
>
> It seems poppler is being unable to extract text in some PDF files:
>
> http://iteisa.com/tmp/poppler-sample.pdf (11 Mb)
>
> pdftohtml from poppler 0.12.4 and 0.12.2 is not able to extract the
> text, and evince shows the document correctly but it's unable to select
> it's text. However acroread shows and selects the text correctly (so
> it's normal, editable text and not an image).
>
> Is it normal? Is there any workaround for this?
>
> Everything seems ok with the file:
>
> $ pdfinfo poppler-sample.pdf
> Title: untitled
> Creator: Adobe InDesign CS4 (6.0.4)
> Producer: Acrobat Distiller 9.0.0 (Windows)
> CreationDate: Wed May 5 09:35:12 2010
> ModDate: Wed May 5 09:35:12 2010
> Tagged: no
> Pages: 208
> Encrypted: no
> Page size: 595.276 x 841.89 pts (A4)
> File size: 10536602 bytes
> Optimized: no
> PDF version: 1.4
>
> Best regards,
Please file a bug.
Albert
More information about the poppler
mailing list