<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - "UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=96313">96313</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>"UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>poppler
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>cpp frontend
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>poppler-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dev@karlchenofhell.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Hi.

ustring::to_utf8() creates a

  MiniIconv ic("UTF-8", "UTF-16");

assuming that iconv(3) uses the native byte order for "UTF-16". On OS X w/
Intel CPUs (I installed poppler through MacPorts, but this issue is unrelated,
see below) this fails, as a quick

  $ echo -n 7 | iconv -t utf-16 |  hexdump -C
  00000000  fe ff 00 37                                       |...7|

reveals: it's UTF-16BE.

This breaks page-labels for me, which instead of "78" (UTF-8) return the (hex)
values

  e3 9c 80 e3 a0 80

which is 0x3700 0x3800.


A fix might be to not "decode" GooString's UTF-16BE to native byte order in

  detail::unicode_GooString_to_ustring(GooString *str)

or use a source encoding based on the BYTE_ORDER macro instead of just
"UTF-16BE" or to check the BOM-character output by iconv(3) (which e.g.

  ustring::from_utf8(const char *str, int len)

currently skips).</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>