[poppler] No text extracted by pdftohtml

Jaime Gómez Obregón jaime at iteisa.com
Sun May 9 07:32:22 PDT 2010

Hi everybody,

It seems poppler is being unable to extract text in some PDF files:

http://iteisa.com/tmp/poppler-sample.pdf (11 Mb)

pdftohtml from poppler 0.12.4 and 0.12.2 is not able to extract the 
text, and evince shows the document correctly but it's unable to select 
it's text. However acroread shows and selects the text correctly (so 
it's normal, editable text and not an image).

Is it normal? Is there any workaround for this?

Everything seems ok with the file:

$ pdfinfo poppler-sample.pdf
Title:          untitled
Creator:        Adobe InDesign CS4 (6.0.4)
Producer:       Acrobat Distiller 9.0.0 (Windows)
CreationDate:   Wed May  5 09:35:12 2010
ModDate:        Wed May  5 09:35:12 2010
Tagged:         no
Pages:          208
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
File size:      10536602 bytes
Optimized:      no
PDF version:    1.4

Best regards,

Jaime GÓMEZ OBREGÓN (jaime at iteisa.com)
Teléfono: +34 902055277
Benidorm, 8 bajo. 39005 Santander.

More information about the poppler mailing list