<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body><table border="1" cellspacing="0" cellpadding="8"> <tr> <th>Bug ID</th> <td><a class="bz_bug_link bz_status_NEW " title="NEW - Can't extract text/html from PDF" href="https://bugs.freedesktop.org/show_bug.cgi?id=97276">97276</a> </td> </tr> <tr> <th>Summary</th> <td>Can't extract text/html from PDF </td> </tr> <tr> <th>Product</th> <td>poppler </td> </tr> <tr> <th>Version</th> <td>unspecified </td> </tr> <tr> <th>Hardware</th> <td>x86-64 (AMD64) </td> </tr> <tr> <th>OS</th> <td>Linux (All) </td> </tr> <tr> <th>Status</th> <td>NEW </td> </tr> <tr> <th>Severity</th> <td>major </td> </tr> <tr> <th>Priority</th> <td>medium </td> </tr> <tr> <th>Component</th> <td>pdftohtml </td> </tr> <tr> <th>Assignee</th> <td>poppler-bugs@lists.freedesktop.org </td> </tr> <tr> <th>Reporter</th> <td>clark@electrobeat.dk </td> </tr></table> <p> <div> <pre>pdftohtml doesn't extract the footer in this PDF <a href="http://docdro.id/ms8RyMC">http://docdro.id/ms8RyMC</a> pdftohtml -s -i input.pdf /output All the text in the bottom with small font size under the thick black horizontal line is not extracted The lowest part extracted is: Forfaldsdato . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 10/08-2016</pre> </div> </p> <hr> <span>You are receiving this mail because:</span> <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>