[Poppler-bugs] [Bug 97276] New: Can't extract text/html from PDF

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Aug 10 09:52:33 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=97276

            Bug ID: 97276
           Summary: Can't extract text/html from PDF
           Product: poppler
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: pdftohtml
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: clark at electrobeat.dk

pdftohtml doesn't extract the footer in this PDF

http://docdro.id/ms8RyMC

pdftohtml -s -i input.pdf /output

All the text in the bottom with small font size under the thick black
horizontal line is not extracted

The lowest part extracted is:

Forfaldsdato . . . . . . . . . . . . . . . . . . . . . . . . . . . . :
10/08-2016

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160810/5d960533/attachment.html>


More information about the Poppler-bugs mailing list