[Poppler-bugs] [Bug 99506] New: pdftotext should filter control characters like "form feed"
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Mon Jan 23 15:17:27 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=99506
Bug ID: 99506
Summary: pdftotext should filter control characters like "form
feed"
Product: poppler
Version: unspecified
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: utils
Assignee: poppler-bugs at lists.freedesktop.org
Reporter: mike at sprachgewalt.de
Created attachment 129108
--> https://bugs.freedesktop.org/attachment.cgi?id=129108&action=edit
Example PDF
Currently, pdftotext/TextOutputDev extracts control characters like form feeds
from the PDF. These should be filtered, as the users expects form feeds to be
inserted by pdftotext alone.
In the attached PDF, there is a form feed character (0xC) extracted between the
word "sich" and the following formula. The form feed is - AFAICT - actually a
character from the CMSY10 font.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20170123/17e8b598/attachment.html>
More information about the Poppler-bugs
mailing list