[Poppler-bugs] [Bug 96313] New: "UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Jun 1 18:03:35 UTC 2016
https://bugs.freedesktop.org/show_bug.cgi?id=96313
Bug ID: 96313
Summary: "UTF-16" not native byte order on OS X iconv (re
ustrings to_utf8)
Product: poppler
Version: unspecified
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: cpp frontend
Assignee: poppler-bugs at lists.freedesktop.org
Reporter: dev at karlchenofhell.org
Hi.
ustring::to_utf8() creates a
MiniIconv ic("UTF-8", "UTF-16");
assuming that iconv(3) uses the native byte order for "UTF-16". On OS X w/
Intel CPUs (I installed poppler through MacPorts, but this issue is unrelated,
see below) this fails, as a quick
$ echo -n 7 | iconv -t utf-16 | hexdump -C
00000000 fe ff 00 37 |...7|
reveals: it's UTF-16BE.
This breaks page-labels for me, which instead of "78" (UTF-8) return the (hex)
values
e3 9c 80 e3 a0 80
which is 0x3700 0x3800.
A fix might be to not "decode" GooString's UTF-16BE to native byte order in
detail::unicode_GooString_to_ustring(GooString *str)
or use a source encoding based on the BYTE_ORDER macro instead of just
"UTF-16BE" or to check the BOM-character output by iconv(3) (which e.g.
ustring::from_utf8(const char *str, int len)
currently skips).
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160601/587134fb/attachment.html>
More information about the Poppler-bugs
mailing list