[poppler] (preferably Linux-based, OS) utility to extract images from image-based pdf files ...

William Bader williambader at hotmail.com
Tue Sep 3 19:38:41 UTC 2019


If you don't find any solutions, you could try an OCR that gives x/y positions of words like 'cuneiform -l eng -f hocr' and then look for holes with no words.

________________________________
From: poppler <poppler-bounces at lists.freedesktop.org> on behalf of Albretch Mueller <lbrtchx at gmail.com>
Sent: Tuesday, September 3, 2019 11:36 AM
To: poppler at lists.freedesktop.org <poppler at lists.freedesktop.org>
Subject: [poppler] (preferably Linux-based, OS) utility to extract images from image-based pdf files ...

The output of pdfimages would be a whole page image if the input is a
non-searchable, image-based pdf files. Take for example:

 https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf

 which utility would detect the cartoons on page 6 and 7?

 lbrtchx
 poppler at lists.freedesktop.org:(preferably Linux-based, OS) utility to
extract images from image-based pdf files ...
_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20190903/2ee6db1e/attachment.html>


More information about the poppler mailing list