[poppler] c++ ustring encoding still completely broken

Jeroen Ooms jeroen at berkeley.edu
Sat Dec 1 22:20:46 UTC 2018


I maintain the poppler bindings for the R programming language and get
a lot of bug reports about corrupted text extracted with poppler.
Below a minimal example that illustrates the problem:

  git clone https://github.com/jeroen/popplertest
  cd popplertest
  g++ -std=c++11 encoding.cpp -o encoding $(pkg-config --cflags --libs
poppler-cpp)
  ./encoding hello.pdf

The output shows a lot of Chinese characters which is incorrect (all
text in the pdf is english).

Back in March 2018, Suzuki Toshiya had posted a patch with at least a
partial solution:
https://lists.freedesktop.org/archives/poppler/2018-March/012962.html
. I hope we can revisit this.


More information about the poppler mailing list