[poppler] pdftotext ignore encoding

Norayr Chilingarian norayr at arnet.am
Tue May 10 19:19:17 UTC 2016


Hello,

I am using pdftotext with the pdf file which has rare old 8bit encoding.
By default pdftotext uses -enc UTF-8 flag, and 8bit encoding becomes 
multibyte in the output  text file.

I need to preserve that encoding, and will be able to handle/convert it 
if necessary later, is it possible somehow to tell pdftotext utility to 
copy symbols as is, in this 8bit encoding?

I have tried using different -enc options, the best results are with 
Latin1, but then not all the letters are copied to the resulting text file.

I need to tell pdftotext to not convert, to just ignore the encoding. Or 
at least transfer characters in range from 127..255 as is, without 
conversion.

Is it possible?

Thank you.


More information about the poppler mailing list