[poppler] pdftotext and pdftohtml and extracting text
Alex
mysqlstudent at gmail.com
Sun Aug 27 17:36:01 UTC 2017
Hi Leonard,
On Sun, Aug 27, 2017 at 11:38 AM, Leonard Rosenthol <lrosenth at adobe.com> wrote:
> Why would an image only PDF (or an Image + a space) be a bad thing?
That's a good point. I guess it wouldn't in and of itself, but
virtually every malicious PDF is created in this way.
> Checking the links in a PDF – regardless of the content – certainly seems like a reasonable thing to do, however.
Malicious PDFs also typically only have one URL.
There's no reason not to check every URL, but I'd also like to find a
unique pattern, if possible, to identify possible zero-day or unique
URLs as part of a spear-phishing campaign and give us a little bit of
an advantage.
Alex
More information about the poppler
mailing list