[poppler] pdftotext ignore encoding
Norayr Chilingarian
norayr at arnet.am
Tue May 10 19:19:17 UTC 2016
Hello,
I am using pdftotext with the pdf file which has rare old 8bit encoding.
By default pdftotext uses -enc UTF-8 flag, and 8bit encoding becomes
multibyte in the output text file.
I need to preserve that encoding, and will be able to handle/convert it
if necessary later, is it possible somehow to tell pdftotext utility to
copy symbols as is, in this 8bit encoding?
I have tried using different -enc options, the best results are with
Latin1, but then not all the letters are copied to the resulting text file.
I need to tell pdftotext to not convert, to just ignore the encoding. Or
at least transfer characters in range from 127..255 as is, without
conversion.
Is it possible?
Thank you.
More information about the poppler
mailing list