[poppler] No text extracted by pdftohtml
Jaime Gómez Obregón
jaime at iteisa.com
Sun May 9 07:32:22 PDT 2010
Hi everybody,
It seems poppler is being unable to extract text in some PDF files:
http://iteisa.com/tmp/poppler-sample.pdf (11 Mb)
pdftohtml from poppler 0.12.4 and 0.12.2 is not able to extract the
text, and evince shows the document correctly but it's unable to select
it's text. However acroread shows and selects the text correctly (so
it's normal, editable text and not an image).
Is it normal? Is there any workaround for this?
Everything seems ok with the file:
$ pdfinfo poppler-sample.pdf
Title: untitled
Creator: Adobe InDesign CS4 (6.0.4)
Producer: Acrobat Distiller 9.0.0 (Windows)
CreationDate: Wed May 5 09:35:12 2010
ModDate: Wed May 5 09:35:12 2010
Tagged: no
Pages: 208
Encrypted: no
Page size: 595.276 x 841.89 pts (A4)
File size: 10536602 bytes
Optimized: no
PDF version: 1.4
Best regards,
--
Jaime GÓMEZ OBREGÓN (jaime at iteisa.com)
http://www.iteisa.com
Teléfono: +34 902055277
ITEISA DESARROLLO Y SISTEMAS, S.L
Benidorm, 8 bajo. 39005 Santander.
España
More information about the poppler
mailing list