[Poppler-bugs] [Bug 107450] New: Glyphs in PDFs produced by Tesseract OCR render as white boxes when selected
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Aug 1 21:20:34 UTC 2018
https://bugs.freedesktop.org/show_bug.cgi?id=107450
Bug ID: 107450
Summary: Glyphs in PDFs produced by Tesseract OCR render as
white boxes when selected
Product: poppler
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: glib frontend
Assignee: poppler-bugs at lists.freedesktop.org
Reporter: barlow.jim+fds at gmail.com
Created attachment 140931
--> https://bugs.freedesktop.org/attachment.cgi?id=140931&action=edit
Test file
Tesseract OCR uses a glyphless font (a font with a single glyph that occupies
empty space) in the PDFs it produces.
When PDFs produced by Tesseract are rendered in and text is selected, Poppler
draws white boxes over top of the background image that contains the text. The
Tesseract team has worked pretty hard on PDF viewer support and compatibility -
to my knowledge the Tesseract glyphless font works correctly in Acrobat,
Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with
multiple platform and including mobile testing. Other PDF viewers do not
attempt to render the glyphless font on top of the background.
This was first reported against Evince, which claims the issue is in Poppler.
https://gitlab.gnome.org/GNOME/evince/issues/953
See that issue for screenshots as no screenshots can be added easily here.
Related issues:
* https://github.com/jbarlow83/OCRmyPDF/issues/249
* https://github.com/jbarlow83/OCRmyPDF/issues/178
The design notes of the glyphless font may be relevant.
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/pdfrenderer.cpp
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20180801/14e2bc76/attachment.html>
More information about the Poppler-bugs
mailing list