[Poppler-bugs] [Bug 56226] New: Poppler does not guard against invalid utf-8

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Oct 20 06:01:09 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=56226

          Priority: medium
            Bug ID: 56226
          Assignee: poppler-bugs at lists.freedesktop.org
           Summary: Poppler does not guard against invalid utf-8
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: benjamin at sipsolutions.net
          Hardware: Other
            Status: NEW
           Version: unspecified
         Component: cairo backend
           Product: poppler

Created attachment 68850
  --> https://bugs.freedesktop.org/attachment.cgi?id=68850&action=edit
ugly workaround

I have a PDF file, that apparently contains the "unicode character" 0xffff.
Obviously this is an invalid character, but poppler insists in feeding it over
to cairo.

My guess is that the PDF file is broken in some way, unfortunately I am not
able to provide the file in question because I don't have enough rights to make
it public. I am not even able to extract that single page, because pdftk
refuses to open the file.

I am attaching a patch that works around the issue. Not a very nice patch in
any way, but it gets the job done. The patch simply copies the validity check
from cairo.

This is what pdftotext prints for the section in question. I think the U+FFFF
characters are scaled {}. ie. similar to what LaTeX would create for:
  Im\left\{ \frac{S_{Last}}{30kVA} \right\}

The Text:
"""
Im

<U+FFFF>

S Last
30kV A

<U+FFFF>
"""

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20121020/a61b59ec/attachment.html>


More information about the Poppler-bugs mailing list