[poppler] Help with pdfimages on USGS maps

Phil Endecott spam_from_poppler at chezphil.org
Thu Mar 19 14:17:14 PDT 2009


Dear All,

USGS maps can be downloaded, free, from their web site at 
http://store.usgs.gov/.  Here's an example of what you can get (17 
MByte file): http://chezphil.org/tmp/Boston_South_K42071C1_geo.PDF.  
It's a PDF from which pdfimages will happily extract a few hundred JPEGs.

What I'd like to do is to assemble a single large raster image (TIFF, 
JPEG, whatever) at the natural resolution of those embedded images.  
That means assembling those few hundred JPEG images in the right 
pattern.  And I'd like to be able to do that automatically for a large 
number of these files.  So:

- Does pdfimages write out the images in an order that has some 
guaranteed relationship to the position of the images on the page?
- Can pdfimages be hacked to output some hint of the positions of each image?
- Alternatively, if I have to use e.g. pdftoppm rather than pdfimages, 
can I somehow determine the correct resolution to tell pdftoppm to use 
to get the natural resolution of the embedded images?

Any suggestions would be much appreciated.  If you're curious, a few 
years ago before the USGS entered the "internet age" a large number of 
digitised maps were obtained in TIFF format on DVDs by the Libre Map 
Project.  Although the intention was to get everything there were a few 
gaps, including the whole of the state of Massachusetts; one theory is 
that that DVD was lost somewhere along the line.  Now, USGS has an 
online store where the maps can be downloaded free - but they're now 
PDFs, not TIFFs.  So I'd like to be able to convert these PDFs into 
TIFFs that can be used alongside the old ones.

There is also the question of the geo-location data which is embedded 
in there too, somehow.  But I'll worry about that later.


Many thanks,

Phil.





More information about the poppler mailing list