<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Priority</th>
<td>medium
</td>
</tr>
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - Poppler does not guard against invalid utf-8"
href="https://bugs.freedesktop.org/show_bug.cgi?id=56226">56226</a>
</td>
</tr>
<tr>
<th>Assignee</th>
<td>poppler-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<th>Summary</th>
<td>Poppler does not guard against invalid utf-8
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Reporter</th>
<td>benjamin@sipsolutions.net
</td>
</tr>
<tr>
<th>Hardware</th>
<td>Other
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Component</th>
<td>cairo backend
</td>
</tr>
<tr>
<th>Product</th>
<td>poppler
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=68850" name="attach_68850" title="ugly workaround">attachment 68850</a> <a href="attachment.cgi?id=68850&action=edit" title="ugly workaround">[details]</a></span>
ugly workaround
I have a PDF file, that apparently contains the "unicode character" 0xffff.
Obviously this is an invalid character, but poppler insists in feeding it over
to cairo.
My guess is that the PDF file is broken in some way, unfortunately I am not
able to provide the file in question because I don't have enough rights to make
it public. I am not even able to extract that single page, because pdftk
refuses to open the file.
I am attaching a patch that works around the issue. Not a very nice patch in
any way, but it gets the job done. The patch simply copies the validity check
from cairo.
This is what pdftotext prints for the section in question. I think the U+FFFF
characters are scaled {}. ie. similar to what LaTeX would create for:
Im\left\{ \frac{S_{Last}}{30kVA} \right\}
The Text:
"""
Im
<U+FFFF>
S Last
30kV A
<U+FFFF>
"""</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>