[Poppler-bugs] [Bug 107317] New: Fix HtmlFont::HtmlFilter to not lose tabs

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Jul 21 03:46:43 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=107317

            Bug ID: 107317
           Summary: Fix HtmlFont::HtmlFilter to not lose tabs
           Product: poppler
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: utils
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: ulatekh at yahoo.com

Created attachment 140749
  --> https://bugs.freedesktop.org/attachment.cgi?id=140749&action=edit
Patch to fix bug

I'm about to use pdftohtml to extract information from PDFs and organize the
results into a database, so I had a chance to dig through the code.

I've had a long-standing problem with qpdfview (which uses poppler) sometimes
copying text out of PDFs incorrectly -- the text copies, but all of the spaces
are missing. After reproducing it with a PDF, I tracked the problem down to the
PDF using tabs where it probably should have used spaces. The patch fixes
HtmlFont::HtmlFilter() to convert incoming tabs to spaces, instead of removing
the whitespace completely.

There are probably other places in the code where the fix in this patch could
be applied, e.g. when copying text in qpdfview.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20180721/fa9dafd3/attachment.html>


More information about the Poppler-bugs mailing list