[Poppler-bugs] [Bug 56226] New: Poppler does not guard against invalid utf-8
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Sat Oct 20 06:01:09 PDT 2012
https://bugs.freedesktop.org/show_bug.cgi?id=56226
Priority: medium
Bug ID: 56226
Assignee: poppler-bugs at lists.freedesktop.org
Summary: Poppler does not guard against invalid utf-8
Severity: normal
Classification: Unclassified
OS: All
Reporter: benjamin at sipsolutions.net
Hardware: Other
Status: NEW
Version: unspecified
Component: cairo backend
Product: poppler
Created attachment 68850
--> https://bugs.freedesktop.org/attachment.cgi?id=68850&action=edit
ugly workaround
I have a PDF file, that apparently contains the "unicode character" 0xffff.
Obviously this is an invalid character, but poppler insists in feeding it over
to cairo.
My guess is that the PDF file is broken in some way, unfortunately I am not
able to provide the file in question because I don't have enough rights to make
it public. I am not even able to extract that single page, because pdftk
refuses to open the file.
I am attaching a patch that works around the issue. Not a very nice patch in
any way, but it gets the job done. The patch simply copies the validity check
from cairo.
This is what pdftotext prints for the section in question. I think the U+FFFF
characters are scaled {}. ie. similar to what LaTeX would create for:
Im\left\{ \frac{S_{Last}}{30kVA} \right\}
The Text:
"""
Im
<U+FFFF>
S Last
30kV A
<U+FFFF>
"""
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20121020/a61b59ec/attachment.html>
More information about the Poppler-bugs
mailing list