[Poppler-bugs] [Bug 76971] New: Problem with non-BMP Unicode characters

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Apr 2 16:53:16 PDT 2014


https://bugs.freedesktop.org/show_bug.cgi?id=76971

          Priority: medium
            Bug ID: 76971
          Assignee: poppler-bugs at lists.freedesktop.org
           Summary: Problem with non-BMP Unicode characters
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: freedesktop at behdad.org
          Hardware: Other
            Status: NEW
           Version: unspecified
         Component: general
           Product: poppler

Created attachment 96814
  --> https://bugs.freedesktop.org/attachment.cgi?id=96814&action=edit
Sample document

Attached PDF is generated by cairo from printing a gedit document with one
character: U+1D780.  Here it is in text: "𝞀".  This is an example of what we
call "non-BMP" Unicode character.  Ie. one that has a code > 0xFFFF.  Ie, it
doesn't fit in two bytes, which means it doesn't in one UTF-16 codepoint.

Printing the attached PDF from evince to a PDF file fails.  Evince generates
the following cairo error:

  cairo context error: input string not valid UTF-8

I think what's happening is that someone somewhere in the poppler chain is not
handling UTF-16 surrogate pairs.  Or some other mishandling.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20140402/53cdc501/attachment.html>


More information about the Poppler-bugs mailing list