[Poppler-bugs] [Bug 96313] New: "UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Jun 1 18:03:35 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=96313

            Bug ID: 96313
           Summary: "UTF-16" not native byte order on OS X iconv (re
                    ustrings to_utf8)
           Product: poppler
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: cpp frontend
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: dev at karlchenofhell.org

Hi.

ustring::to_utf8() creates a

  MiniIconv ic("UTF-8", "UTF-16");

assuming that iconv(3) uses the native byte order for "UTF-16". On OS X w/
Intel CPUs (I installed poppler through MacPorts, but this issue is unrelated,
see below) this fails, as a quick

  $ echo -n 7 | iconv -t utf-16 |  hexdump -C
  00000000  fe ff 00 37                                       |...7|

reveals: it's UTF-16BE.

This breaks page-labels for me, which instead of "78" (UTF-8) return the (hex)
values

  e3 9c 80 e3 a0 80

which is 0x3700 0x3800.


A fix might be to not "decode" GooString's UTF-16BE to native byte order in

  detail::unicode_GooString_to_ustring(GooString *str)

or use a source encoding based on the BYTE_ORDER macro instead of just
"UTF-16BE" or to check the BOM-character output by iconv(3) (which e.g.

  ustring::from_utf8(const char *str, int len)

currently skips).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160601/587134fb/attachment.html>


More information about the Poppler-bugs mailing list