<html> <head> <base href="https://bugs.documentfoundation.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Scanned PDF shows hidden text" href="https://bugs.documentfoundation.org/show_bug.cgi?id=104770#c11">Comment # 11</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Scanned PDF shows hidden text" href="https://bugs.documentfoundation.org/show_bug.cgi?id=104770">bug 104770</a> from <a class="email" href="mailto:vstuart.foote@utsa.edu" title="V Stuart Foote <vstuart.foote@utsa.edu>"> V Stuart Foote</a> <pre>Issue remains with current master/6.3.0alpha0+, However as the OCR'd PDF is a bitmap, the text spans are annotation on that image. Showing the annotation on import to Draw--where the PDF is broken out to its component Draw elements--actually seems correct. Inserting the PDF (pdfium based, but just the first page of PDF for now) renders the PDF page as an image. The inserted "Image" can be selected and with "Break" split into its component text and the scanned newspaper page. After break, the scanned image can be selected and removed leaving just the OCR text as Draw text frames. It is slow, and utility of this is questionable--but then manipulating PDF content is questionable. The text from a PDF is not intended to be manipulated.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>