<html> <head> <base href="https://bugs.documentfoundation.org/"> </head> <body><span class="vcard"><a class="email" href="mailto:vstuart.foote@utsa.edu" title="V Stuart Foote <vstuart.foote@utsa.edu>"> <span class="fn">V Stuart Foote</span></a> </span> changed <a class="bz_bug_link bz_status_NEW " title="NEW - Offer means to handle import of PDF containing both raster image pages and OCR text" href="https://bugs.documentfoundation.org/show_bug.cgi?id=132493">bug 132493</a> <br> <table border="1" cellspacing="0" cellpadding="8"> <tr> <th>What</th> <th>Removed</th> <th>Added</th> </tr> <tr> <td style="text-align:right;">Ever confirmed</td> <td> </td> <td>1 </td> </tr> <tr> <td style="text-align:right;">Blocks</td> <td> </td> <td>99746 </td> </tr> <tr> <td style="text-align:right;">CC</td> <td> </td> <td>thb@libreoffice.org, vmiklos@collabora.com, vstuart.foote@utsa.edu </td> </tr> <tr> <td style="text-align:right;">Status</td> <td>UNCONFIRMED </td> <td>NEW </td> </tr> <tr> <td style="text-align:right;">Summary</td> <td>error on opening PDF </td> <td>Offer means to handle import of PDF containing both raster image pages and OCR text </td> </tr> <tr> <td style="text-align:right;">Severity</td> <td>normal </td> <td>enhancement </td> </tr></table> <p> <div> <b><a class="bz_bug_link bz_status_NEW " title="NEW - Offer means to handle import of PDF containing both raster image pages and OCR text" href="https://bugs.documentfoundation.org/show_bug.cgi?id=132493#c1">Comment # 1</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Offer means to handle import of PDF containing both raster image pages and OCR text" href="https://bugs.documentfoundation.org/show_bug.cgi?id=132493">bug 132493</a> from <span class="vcard"><a class="email" href="mailto:vstuart.foote@utsa.edu" title="V Stuart Foote <vstuart.foote@utsa.edu>"> <span class="fn">V Stuart Foote</span></a> </span></b> <pre>The PDF opens fine, the issue is that it had been prepared with OCR of the page images. You can remove the OCR by opening in your PDF viewer of choice and then printing the result back to PDF. Just the page images will be output--none of the OCR text runs. Alternatively if you prefer, or need the OCR results--you can do that with LibreOffice Draw. It is a manual process where by on each page of the imported PDF you select the source page's image and delete it, leaving the OCR text runs behind. But, it would be kind of convenient if the pdf import filter offered methods to strip out either the image, or the OCR text when both are present.</pre> </div> </p> <div id="referenced"> <hr style="border: 1px dashed #969696"> <b>Referenced Bugs:</b> <ul> <li> [<a class="bz_bug_link bz_status_NEW " title="NEW - [META] PDF import filter in Draw" href="https://bugs.documentfoundation.org/show_bug.cgi?id=99746">Bug 99746</a>] [META] PDF import filter in Draw </li> </ul> </div> <br> <hr> <span>You are receiving this mail because:</span> <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>