[Poppler-bugs] [Bug 107450] New: Glyphs in PDFs produced by Tesseract OCR render as white boxes when selected

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Aug 1 21:20:34 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=107450

            Bug ID: 107450
           Summary: Glyphs in PDFs produced by Tesseract OCR render as
                    white boxes when selected
           Product: poppler
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: glib frontend
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: barlow.jim+fds at gmail.com

Created attachment 140931
  --> https://bugs.freedesktop.org/attachment.cgi?id=140931&action=edit
Test file

Tesseract OCR uses a glyphless font (a font with a single glyph that occupies
empty space) in the PDFs it produces.

When PDFs produced by Tesseract are rendered in and text is selected, Poppler
draws white boxes over top of the background image that contains the text. The
Tesseract team has worked pretty hard on PDF viewer support and compatibility -
to my knowledge the Tesseract glyphless font works correctly in Acrobat,
Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with
multiple platform and including mobile testing. Other PDF viewers do not
attempt to render the glyphless font on top of the background.

This was first reported against Evince, which claims the issue is in Poppler.
https://gitlab.gnome.org/GNOME/evince/issues/953

See that issue for screenshots as no screenshots can be added easily here.

Related issues:
* https://github.com/jbarlow83/OCRmyPDF/issues/249
* https://github.com/jbarlow83/OCRmyPDF/issues/178

The design notes of the glyphless font may be relevant.
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/pdfrenderer.cpp

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20180801/14e2bc76/attachment.html>


More information about the Poppler-bugs mailing list