[Poppler-bugs] [Bug 76971] New: Problem with non-BMP Unicode characters
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Apr 2 16:53:16 PDT 2014
https://bugs.freedesktop.org/show_bug.cgi?id=76971
Priority: medium
Bug ID: 76971
Assignee: poppler-bugs at lists.freedesktop.org
Summary: Problem with non-BMP Unicode characters
Severity: normal
Classification: Unclassified
OS: All
Reporter: freedesktop at behdad.org
Hardware: Other
Status: NEW
Version: unspecified
Component: general
Product: poppler
Created attachment 96814
--> https://bugs.freedesktop.org/attachment.cgi?id=96814&action=edit
Sample document
Attached PDF is generated by cairo from printing a gedit document with one
character: U+1D780. Here it is in text: "𝞀". This is an example of what we
call "non-BMP" Unicode character. Ie. one that has a code > 0xFFFF. Ie, it
doesn't fit in two bytes, which means it doesn't in one UTF-16 codepoint.
Printing the attached PDF from evince to a PDF file fails. Evince generates
the following cairo error:
cairo context error: input string not valid UTF-8
I think what's happening is that someone somewhere in the poppler chain is not
handling UTF-16 surrogate pairs. Or some other mishandling.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20140402/53cdc501/attachment.html>
More information about the Poppler-bugs
mailing list