[poppler] reconsideration about iconv() in cpp frontend
suzuki toshiya
mpsuzuki at hiroshima-u.ac.jp
Thu Apr 26 17:03:40 UTC 2018
Dear Adam,
Thank you for review & comment, and sorry for lated response.
Adam Reichold wrote:
> I like it. Some minor remarks:
>
> * Is this the only user of iconv within Poppler? If so, we could remove
> it from the build system as wel.
OK.
> * In "ustring::to_utf8":
> - "utf16_end" is only used to compute the size once, maybe just use
> "size()" to call "utf16ToUtf8"
It is slightly different, it is used to compute the length without BOM.
Also the first UTF-16 character to be passed to utf16ToUtf8() is changed
to skip the BOM. To eliminate it, the code would be like this. This
is better?
byte_array ustring::to_utf8() const
{
if (!size()) {
return byte_array();
}
const uint16_t* utf16_buff = data();
const bool hasBOM = (utf16_buff[0] == 0xFEFF);
const int utf8_len = utf16CountUtf8Bytes(utf16_buff + (hasBOM ? 1 : 0));
byte_array ret(utf8_len + 1, '\0');
utf8_len = utf16ToUtf8(utf16_buff + (hasBOM ? 1 : 0),
reinterpret_cast<char *>(ret.data()),
utf8_len + 1,
size() - (hasBOM ? 1 : 0));
ret.resize(utf8_len);
return ret;
}
> - "utf16ToUtf8" returns how many bytes were written, so no need to
> call "std::strlen" again
OK.
> * In "ustring::to_latin1":
> - for readability, I'd suggest either iterating both "me" and "ret"
> using a parallel index "i" or via pointer arithmetic, but not mixing it
how about this?
std::string ustring::to_latin1() const
{
if (!size()) {
return std::string();
}
const bool hasBOM = (*data() == 0xFEFF);
std::string ret(size() + (hasBOM ? 0 : 1), '\0');
iterator itl_utf16 = begin() + (hasBOM ? 1 : 0);
iterator itl_ret = ret.begin();
for (; itl_utf16 < end(); itl_utf16++, itl_latin1++) {
*itl_ret = (char)*itl_utf16;
}
return ret;
}
> * In "has_bom_utf8":
> - The spacing of the first "if" is a bit off
> - Instead of "if(foo) { return true; } return false;", just do "return
> foo;"
OK.
> * In "ustring::from_utf8":
> - Three times "skip BOM" is probably a bit on the noisy side ;-)
OK.
> - Why no additional call to "ret.resize" as for "to_utf8"? Or why the
> additional call there?
ret.resize() added.
> Just for my understanding, this also contains another important fix that
> is unrelated to the iconv usage?
>
> - u = data[i] & 0xff;
> + u = pdfDocEncoding[ *data & 0xff ];
Ah, yes. I had once explained in the beginning of this thread
for reworking, but slipped to mention it for the patch.
> Maybe the changes to "detail::unicode_GooString_to_ustring" could be a
> separate commit?
OK, it is separated to fix-cpp-frontend-endian-issue_20180427-0130-b.
Others are summarized in fix-cpp-frontend-endian-issue_20180427-0130-a.
Regards,
mpsuzuki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-cpp-frontend-endian-issue_20180427-0130-a.diff.xz
Type: application/octet-stream
Size: 2724 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180427/b9898b74/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-cpp-frontend-endian-issue_20180427-0130-b.diff.xz
Type: application/octet-stream
Size: 668 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180427/b9898b74/attachment-0001.obj>
More information about the poppler
mailing list