<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - pdftotext should filter control characters like "form feed""
href="https://bugs.freedesktop.org/show_bug.cgi?id=99506">99506</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>pdftotext should filter control characters like "form feed"
</td>
</tr>
<tr>
<th>Product</th>
<td>poppler
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Hardware</th>
<td>Other
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>medium
</td>
</tr>
<tr>
<th>Component</th>
<td>utils
</td>
</tr>
<tr>
<th>Assignee</th>
<td>poppler-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>mike@sprachgewalt.de
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=129108" name="attach_129108" title="Example PDF">attachment 129108</a> <a href="attachment.cgi?id=129108&action=edit" title="Example PDF">[details]</a></span>
Example PDF
Currently, pdftotext/TextOutputDev extracts control characters like form feeds
from the PDF. These should be filtered, as the users expects form feeds to be
inserted by pdftotext alone.
In the attached PDF, there is a form feed character (0xC) extracted between the
word "sich" and the following formula. The form feed is - AFAICT - actually a
character from the CMSY10 font.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>