[poppler] pdftotext raw

Massimo Redaelli mredaelli at lari.digital
Thu May 16 15:00:27 UTC 2019


Hey all.

Question regarding pdftotext.

The help says that `raw` is not recommended anymore, but for all PDFs
I tried it actually gives better results than the default mode, by
which I mean that paragraphs are not interrupted by extraneous text,
like headers or boxes.
(I do have to handle hyphenated words, but that looks easy.)

Is the option going to be deprecated, or can we count on it being
there for the foreseeable future?
Are there reasons not to use it?

Thanks!

-- 
M.


More information about the poppler mailing list