[OCR] Extract text layer, fix errors, re-import?

Gilles codecomplete at free.fr
Thu Aug 29 19:08:10 UTC 2024


Hello,

I noticed some typos in the text layer added by an OCR into a "bitmap" 
PDF, ie. pages are actually scanned pages.

I first tried opening the EPUB generated by Abbyy Finereader, but 
LibreOffice couldn't open it at all, while Sigil could after showing an 
error message but lacks a French dictionary to run the job (as far as I 
can tell).

As an alternative, pdftotext or mutool (convert) can extract the text 
layer from such PDF, but can they put it back after I fixed the typos?

Thank you.



More information about the poppler mailing list