[poppler] poppler::ustring encoding issue

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Thu Apr 12 08:33:33 UTC 2018


Dear Jeroen,

Please let me prepare some data for regression test.
The data I've tested are mainly ASCII or UTF-16BE data.
I should check PDFEncoding data cases (if anybody already has something
appropriate, please let me know).

Regards,
mpsuzuki

Jeroen Ooms wrote:
> FYI the encoding problems still exist in the master branch today. I am
> very interested in this patch by mpsuzuki, what can we do to move this
> forward?
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Mar 28, 2018 at 2:26 PM, suzuki toshiya
> <mpsuzuki at hiroshima-u.ac.jp> wrote:
>> Dear Adam,
>>
>> Adam Reichold wrote:
>>>> I see. where is the appropriate place to add a document of
>>>> poppler::ustring class itself?
>>> Personally, I would suggest Doxygen comments in the public header.
>> Thanks! Now I'm trying to write... also I found Doxygen comments
>> for text_list needs the improvement.
>>
>> During the check of the existing functions (to add documents),
>> I found a few inconsistencies about BOM.
>>
>> * ustring::to_latin1() this function does not use iconv(),
>> this function just cast the types between unsigned short and
>> char. BOM could not be converted to Latin-1, but the exist of
>> BOM is not checked. if stored UTF-16 has a BOM, broken 8bit
>> would be inserted in the beginning of the result.
>>
>> * ustring::from_latin1() this function does not use iconv()
>> either. BOM is not inserted to the beginning. no-BOM UTF-16
>> string is created.
>>
>> * ustring::to_utf8() BOM or no-BOM is decided by iconv().
>>
>> * ustring::from_utf8() assuming iconv() returns with-BOM UTF-16.
>>
>> I would collect Debian software packages depending libpoppler-cpp,
>> and check how they use ustring object. In my rough check it
>> would be less than 10, checking all of them would not be so
>> time-consuming. If there are softwares which always the skip
>> first character of UTF-16 (based on the assumption as the
>> ustring is always with UTF-16 with BOM), some discussion is
>> needed.
>>
>> Regards,
>> mpsuzuki
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list