[poppler] reconsideration about iconv() in cpp frontend

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Thu Apr 26 17:03:40 UTC 2018


Dear Adam,

Thank you for review & comment, and sorry for lated response.

Adam Reichold wrote:
> I like it. Some minor remarks:
> 
> * Is this the only user of iconv within Poppler? If so, we could remove
> it from the build system as wel.

OK.

> * In "ustring::to_utf8":
>   - "utf16_end" is only used to compute the size once, maybe just use
> "size()" to call "utf16ToUtf8"

It is slightly different, it is used to compute the length without BOM.
Also the first UTF-16 character to be passed to utf16ToUtf8() is changed
to skip the BOM. To eliminate it, the code would be like this. This
is better?

byte_array ustring::to_utf8() const
{
    if (!size()) {
        return byte_array();
    }

    const uint16_t* utf16_buff = data();
    const bool hasBOM = (utf16_buff[0] == 0xFEFF);
    const int utf8_len = utf16CountUtf8Bytes(utf16_buff + (hasBOM ? 1 : 0));

    byte_array ret(utf8_len + 1, '\0');
    utf8_len = utf16ToUtf8(utf16_buff + (hasBOM ? 1 : 0),
                        reinterpret_cast<char *>(ret.data()),
                        utf8_len + 1,
                        size() - (hasBOM ? 1 : 0));
    ret.resize(utf8_len);

    return ret;
}

>   - "utf16ToUtf8" returns how many bytes were written, so no need to
> call "std::strlen" again

OK.

> * In "ustring::to_latin1":
>   - for readability, I'd suggest either iterating both "me" and "ret"
> using a parallel index "i" or via pointer arithmetic, but not mixing it

how about this?
std::string ustring::to_latin1() const
{
    if (!size()) {
        return std::string();
    }

    const bool hasBOM = (*data() == 0xFEFF);
    std::string ret(size() + (hasBOM ? 0 : 1), '\0');
    iterator itl_utf16 = begin() + (hasBOM ? 1 : 0);
    iterator itl_ret = ret.begin();

    for (; itl_utf16 < end(); itl_utf16++, itl_latin1++) {
        *itl_ret = (char)*itl_utf16;
    }
    return ret;
}


> * In "has_bom_utf8":
>   - The spacing of the first "if" is a bit off
>   - Instead of "if(foo) { return true; } return false;", just do "return
> foo;"

OK.

> * In "ustring::from_utf8":
>   - Three times "skip BOM" is probably a bit on the noisy side ;-)

OK.

>   - Why no additional call to "ret.resize" as for "to_utf8"? Or why the
> additional call there?

ret.resize() added.

> Just for my understanding, this also contains another important fix that
> is unrelated to the iconv usage?
> 
> -            u = data[i] & 0xff;
> +            u = pdfDocEncoding[ *data & 0xff ];

Ah, yes. I had once explained in the beginning of this thread
for reworking, but slipped to mention it for the patch.

> Maybe the changes to "detail::unicode_GooString_to_ustring" could be a
> separate commit?

OK, it is separated to fix-cpp-frontend-endian-issue_20180427-0130-b.
Others are summarized in fix-cpp-frontend-endian-issue_20180427-0130-a.

Regards,
mpsuzuki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-cpp-frontend-endian-issue_20180427-0130-a.diff.xz
Type: application/octet-stream
Size: 2724 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180427/b9898b74/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-cpp-frontend-endian-issue_20180427-0130-b.diff.xz
Type: application/octet-stream
Size: 668 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180427/b9898b74/attachment-0001.obj>


More information about the poppler mailing list