<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
If you don't find any solutions, you could try an OCR that gives x/y positions of words like 'cuneiform -l eng -f hocr' and then look for holes with no words.</div>
<div>
<div id="appendonsend"></div>
<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> poppler <poppler-bounces@lists.freedesktop.org> on behalf of Albretch Mueller <lbrtchx@gmail.com><br>
<b>Sent:</b> Tuesday, September 3, 2019 11:36 AM<br>
<b>To:</b> poppler@lists.freedesktop.org <poppler@lists.freedesktop.org><br>
<b>Subject:</b> [poppler] (preferably Linux-based, OS) utility to extract images from image-based pdf files ...</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText">The output of pdfimages would be a whole page image if the input is a<br>
non-searchable, image-based pdf files. Take for example:<br>
<br>
 <a href="https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf">https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf</a><br>
<br>
 which utility would detect the cartoons on page 6 and 7?<br>
<br>
 lbrtchx<br>
 poppler@lists.freedesktop.org:(preferably Linux-based, OS) utility to<br>
extract images from image-based pdf files ...<br>
_______________________________________________<br>
poppler mailing list<br>
poppler@lists.freedesktop.org<br>
<a href="https://lists.freedesktop.org/mailman/listinfo/poppler">https://lists.freedesktop.org/mailman/listinfo/poppler</a></div>
</span></font></div>
</div>
</body>
</html>