[Poppler-bugs] [Bug 99506] New: pdftotext should filter control characters like "form feed"

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Jan 23 15:17:27 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=99506

            Bug ID: 99506
           Summary: pdftotext should filter control characters like "form
                    feed"
           Product: poppler
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: utils
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: mike at sprachgewalt.de

Created attachment 129108
  --> https://bugs.freedesktop.org/attachment.cgi?id=129108&action=edit
Example PDF

Currently, pdftotext/TextOutputDev extracts control characters like form feeds
from the PDF. These should be filtered, as the users expects form feeds to be
inserted by pdftotext alone.

In the attached PDF, there is a form feed character (0xC) extracted between the
word "sich" and the following formula. The form feed is - AFAICT - actually a
character from the CMSY10 font.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20170123/17e8b598/attachment.html>


More information about the Poppler-bugs mailing list