[poppler] Help with pdfimages on USGS maps
Phil Endecott
spam_from_poppler at chezphil.org
Thu Mar 19 14:17:14 PDT 2009
Dear All,
USGS maps can be downloaded, free, from their web site at
http://store.usgs.gov/. Here's an example of what you can get (17
MByte file): http://chezphil.org/tmp/Boston_South_K42071C1_geo.PDF.
It's a PDF from which pdfimages will happily extract a few hundred JPEGs.
What I'd like to do is to assemble a single large raster image (TIFF,
JPEG, whatever) at the natural resolution of those embedded images.
That means assembling those few hundred JPEG images in the right
pattern. And I'd like to be able to do that automatically for a large
number of these files. So:
- Does pdfimages write out the images in an order that has some
guaranteed relationship to the position of the images on the page?
- Can pdfimages be hacked to output some hint of the positions of each image?
- Alternatively, if I have to use e.g. pdftoppm rather than pdfimages,
can I somehow determine the correct resolution to tell pdftoppm to use
to get the natural resolution of the embedded images?
Any suggestions would be much appreciated. If you're curious, a few
years ago before the USGS entered the "internet age" a large number of
digitised maps were obtained in TIFF format on DVDs by the Libre Map
Project. Although the intention was to get everything there were a few
gaps, including the whole of the state of Massachusetts; one theory is
that that DVD was lost somewhere along the line. Now, USGS has an
online store where the maps can be downloaded free - but they're now
PDFs, not TIFFs. So I'd like to be able to convert these PDFs into
TIFFs that can be used alongside the old ones.
There is also the question of the geo-location data which is embedded
in there too, somehow. But I'll worry about that later.
Many thanks,
Phil.
More information about the poppler
mailing list