[Libreoffice-bugs] [Bug 104770] Scanned PDF shows hidden text

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed Mar 20 13:37:52 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=104770

--- Comment #11 from V Stuart Foote <vstuart.foote at utsa.edu> ---
Issue remains with current master/6.3.0alpha0+, 

However as the OCR'd PDF is a bitmap, the text spans are annotation on that
image. Showing the annotation on import to Draw--where the PDF is broken out to
its component Draw elements--actually seems correct.

Inserting the PDF (pdfium based, but just the first page of PDF for now)
renders the PDF page as an image. 

The inserted "Image" can be selected and with "Break" split into its component
text and the scanned newspaper page. After break, the scanned image can be
selected and removed leaving just the OCR text as Draw text frames.  It is
slow, and utility of this is questionable--but then manipulating PDF content is
questionable. The text from a PDF is not intended to be manipulated.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190320/8b2b0879/attachment.html>


More information about the Libreoffice-bugs mailing list