[poppler] No text extracted by pdftohtml

Albert Astals Cid aacid at kde.org
Wed May 26 12:17:34 PDT 2010


A Diumenge, 9 de maig de 2010, Jaime Gómez Obregón va escriure:
> Hi everybody,
> 
> It seems poppler is being unable to extract text in some PDF files:
> 
> http://iteisa.com/tmp/poppler-sample.pdf (11 Mb)
> 
> pdftohtml from poppler 0.12.4 and 0.12.2 is not able to extract the
> text, and evince shows the document correctly but it's unable to select
> it's text. However acroread shows and selects the text correctly (so
> it's normal, editable text and not an image).
> 
> Is it normal? Is there any workaround for this?
> 
> Everything seems ok with the file:
> 
> $ pdfinfo poppler-sample.pdf
> Title:          untitled
> Creator:        Adobe InDesign CS4 (6.0.4)
> Producer:       Acrobat Distiller 9.0.0 (Windows)
> CreationDate:   Wed May  5 09:35:12 2010
> ModDate:        Wed May  5 09:35:12 2010
> Tagged:         no
> Pages:          208
> Encrypted:      no
> Page size:      595.276 x 841.89 pts (A4)
> File size:      10536602 bytes
> Optimized:      no
> PDF version:    1.4
> 
> Best regards,

Please file a bug.

Albert


More information about the poppler mailing list