[poppler] Using PDF tools to extract links from images?

John LaCour jal at phishlabs.com
Wed Jan 9 20:29:59 UTC 2019


I have a number of PDFs that have images that are hyperlinks to URIs.    For example, I see this in the PDF file:

<</Subtype/Link/Rect[ 167.3 515.83 429.43 561.1] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(https://host.com/path/file.php) >>/StructParent 1>>

I want to extract the URI:  https://host.com/path/file.php

Easy enough to do parse out the URL if I can get the tools to output something with the URI, but none of the poppler pdf tools seems to spit this out.

Is there any way to extract this information with the poppler PDF tools?    Am I missing something?

Is there a PDF tool that will just give the raw dump of the contents (handling the decompression, field markers, etc.) ?

Thanks
John



More information about the poppler mailing list