[OCR] Extract text layer, fix errors, re-import?

Gilles codecomplete at free.fr
Fri Aug 30 10:47:21 UTC 2024


Never mind: I'll just convert the PDF to EPUB, and edit the HTML files 
it contains.

On 29/08/2024 21:08, Gilles wrote:
> Hello,
>
> I noticed some typos in the text layer added by an OCR into a "bitmap" 
> PDF, ie. pages are actually scanned pages.
>
> I first tried opening the EPUB generated by Abbyy Finereader, but 
> LibreOffice couldn't open it at all, while Sigil could after showing 
> an error message but lacks a French dictionary to run the job (as far 
> as I can tell).
>
> As an alternative, pdftotext or mutool (convert) can extract the text 
> layer from such PDF, but can they put it back after I fixed the typos?
>
> Thank you.
>



More information about the poppler mailing list