<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Glyphs in PDFs produced by Tesseract OCR render as white boxes when selected"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=107450">107450</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Glyphs in PDFs produced by Tesseract OCR render as white boxes when selected
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>poppler
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>glib frontend
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>poppler-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>barlow.jim+fds@gmail.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=140931" name="attach_140931" title="Test file">attachment 140931</a> <a href="attachment.cgi?id=140931&action=edit" title="Test file">[details]</a></span>
Test file

Tesseract OCR uses a glyphless font (a font with a single glyph that occupies
empty space) in the PDFs it produces.

When PDFs produced by Tesseract are rendered in and text is selected, Poppler
draws white boxes over top of the background image that contains the text. The
Tesseract team has worked pretty hard on PDF viewer support and compatibility -
to my knowledge the Tesseract glyphless font works correctly in Acrobat,
Pdfium, PDF.js, macOS Preview, Dropbox PDF Viewer, MuPDF and Ghostscript; with
multiple platform and including mobile testing. Other PDF viewers do not
attempt to render the glyphless font on top of the background.

This was first reported against Evince, which claims the issue is in Poppler.
<a href="https://gitlab.gnome.org/GNOME/evince/issues/953">https://gitlab.gnome.org/GNOME/evince/issues/953</a>

See that issue for screenshots as no screenshots can be added easily here.

Related issues:
* <a href="https://github.com/jbarlow83/OCRmyPDF/issues/249">https://github.com/jbarlow83/OCRmyPDF/issues/249</a>
* <a href="https://github.com/jbarlow83/OCRmyPDF/issues/178">https://github.com/jbarlow83/OCRmyPDF/issues/178</a>

The design notes of the glyphless font may be relevant.
<a href="https://github.com/tesseract-ocr/tesseract/blob/master/src/api/pdfrenderer.cpp">https://github.com/tesseract-ocr/tesseract/blob/master/src/api/pdfrenderer.cpp</a></pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>