[poppler] pdftotext and pdftohtml and extracting text

Sun Aug 27 17:36:01 UTC 2017

Hi Leonard,

On Sun, Aug 27, 2017 at 11:38 AM, Leonard Rosenthol <lrosenth at adobe.com> wrote:
> Why would an image only PDF (or an Image + a space) be a bad thing?

That's a good point. I guess it wouldn't in and of itself, but
virtually every malicious PDF is created in this way.

> Checking the links in a PDF – regardless of the content – certainly seems like a reasonable thing to do, however.

Malicious PDFs also typically only have one URL.

There's no reason not to check every URL, but I'd also like to find a
unique pattern, if possible, to identify possible zero-day or unique
URLs as part of a spear-phishing campaign and give us a little bit of
an advantage.

Alex