[Poppler-bugs] [Bug 28282] New: pdftohtml is unable to extract the text in some PDF files
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu May 27 07:49:03 PDT 2010
https://bugs.freedesktop.org/show_bug.cgi?id=28282
Summary: pdftohtml is unable to extract the text in some PDF
files
Product: poppler
Version: unspecified
Platform: x86 (IA32)
OS/Version: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: general
AssignedTo: poppler-bugs at lists.freedesktop.org
ReportedBy: jaime at iteisa.com
(As discussed in
http://lists.freedesktop.org/archives/poppler/2010-May/005791.html)
It seems poppler is being unable to extract text in some PDF files (I'm not
attaching the file to this bug report due to its lenght):
http://iteisa.com/tmp/poppler-sample.pdf (11 Mb)
pdftohtml from poppler 0.12.4 and 0.12.2 is not able to extract the
text, and evince shows the document correctly but it's unable to select
it's text. However acroread shows and selects the text correctly (so
it's normal, editable text and not an image).
Everything seems ok with the file:
$ pdfinfo poppler-sample.pdf
> Title: untitled
> Creator: Adobe InDesign CS4 (6.0.4)
> Producer: Acrobat Distiller 9.0.0 (Windows)
> CreationDate: Wed May 5 09:35:12 2010
> ModDate: Wed May 5 09:35:12 2010
> Tagged: no
> Pages: 208
> Encrypted: no
> Page size: 595.276 x 841.89 pts (A4)
> File size: 10536602 bytes
> Optimized: no
> PDF version: 1.4
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Poppler-bugs
mailing list