[poppler] reconsideration about iconv() in cpp frontend
Adam Reichold
adam.reichold at t-online.de
Sun Apr 15 09:15:56 UTC 2018
Hello,
Am 15.04.2018 um 05:08 schrieb suzuki toshiya:
> Hi,
>
> Now I'm reworking with cpp-frontend encoding issue, and found that:
> previous patch fixes the issue for text drawn in a PDF, but does
> not fix the broken metadata if it is coded by PDFDocEncoding
> (sample PDF is attached).
>
> During the rework, I'm becoming questionable about the usage of
> iconv() in cpp frontend, by following reasons.
>
> 1) currently, the usage of iconv() in cpp frontend is only the
> conversion of Unicode: UTF-8 <-> UTF-16. This is completely
> mathematical conversion, using iconv() could be overkill.
>
> in fact, poppler has its own conversion routines (see UTF.h),
> which are independent from iconv().
>
> 2) iconv() requires the allocated buffer, but does not help
> the calculation of the required buffer size.
>
> because this difficulty, current usages of iconv() have ugly
> fallbacks for the cases that prepared buffer is too short,
> like this:
>
> size_t ir = iconv(ic, (ICONV_CONST char **)&me_data, &me_len_char,
> &str_data, &str_len_left);
> if ((ir == (size_t)-1) && (errno == E2BIG)) {
> const size_t delta = str_data - &str[0];
> str_len_left += str.size();
> str.resize(str.size() * 2);
> str_data = &str[delta];
> ir = iconv(ic, (ICONV_CONST char **)&me_data, &me_len_char, &str_data,
> &str_len_left);
> if (ir == (size_t)-1) {
> return byte_array();
> }
> }
> str.resize(str.size() - str_len_left);
> return str;
>
> maybe due to this difficulty, glib or Qt frontends do not use
> iconv() directly, they use better utility functions in these
> frameworks.
>
> the functions in UTF.h have the functions to calculate the
> required buffer size from given UTF-8 or UTF-16 data.
>
> 3) iconv() receives the buffer filled by, or to be filled by
> UTF-16 data. currently, casting by reinterpret_cast<char *>()
> is used to interchange between ustring (the array of unsigned
> short) and a byte buffer, but it is not theoretically portable.
> On the systems whose short is greater than 16-bit, or its
> alignment is greater than 16-bit, the casted byte buffer would
> have unexpected data from MSB bits or padding, and it could
> make iconv() confused. using uint16_t (as the functions in UTF.h)
> would be safer.
>
> To do in a theoretically portable way, we should allocate the
> byte buffer for UTF-16 data, and copy to or copy from it,
> instead of the casting. This would be uneasy for the maintenance.
>
> ==
>
> How could I resolve this issue? There might be a few candidates.
>
> A) use the functions in UTF.h, and change ustring class
> to fit them.
> - pros: easiest maintainability, minimum duplication.
> - cons: API changed, and the memory management by gmem.h
> would be introduced in cpp frontend, it would be less
> consistent in cpp frontend internal.
>
> B) rewrite the functions in UTF.h by template programming
> and have the interfaces fitting to current ustring().
> - pros: minimum duplication, good consistency, API kept.
> - cons: maintainability would be decreased slightly.
>
> C) write diversions of the functions in UTF.h in cpp frontend.
> - pros: acceptable maintainability, good consistency, API kept.
> - cons: duplication of almost same code into 2 places.
I think what is best here is hard to decide without seeing actual code
that at least starts in one of these directions.
Concerning ustring, I think an API change is not possible except for
changing unsigned short to uint16_t since it already was broken where
sizeof(unsigned short) != 2.
Best regards,
Adam
> I hope to hear the comments from experts.
>
> Regards,
> mpsuzuki
>
>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180415/4649388e/attachment-0001.sig>
More information about the poppler
mailing list