[poppler] poppler::ustring encoding issue

Jeroen Ooms jeroen at berkeley.edu
Tue Mar 6 11:59:42 UTC 2018


On Tue, Mar 6, 2018 at 10:31 AM, Adam Reichold
<adam.reichold at t-online.de> wrote:
> Hello mpsuzuki,
>
> from a glance at the code, it seems page::text uses ustring::from_utf8
> to convert Poppler's GooString into ustring which seems correct if
> GlobalParams::textEncoding has its default value of "UTF-8" .

I don't understand this part. Why is textEncoding a global property?
Shouldn't this be a property of single pdf document? Is there some way
I can read a document's encoding from the C++ api (without including
GlobalParams.h).

The pdf spec states that different strings may have different
encodings. Perhaps it would be possible to expose an encoding field in
the ustring class? If there would be a way to know the encoding of a
ustring, I can get the raw data and convert it to a suitable encoding
myself. This would be much better than making assumptions.


More information about the poppler mailing list