[poppler] Extracting geolocation metadata from a GeoPDF file

Phil Endecott spam_from_poppler at chezphil.org
Fri Mar 26 10:29:45 PDT 2010


Dear Experts,

I need to extract the geolocation metadata from a GeoPDF file.

If you're not familiar with this format, it's something that was
developed by a company called TerraGo Technologies and was adopted as a
"best practice" by the Open Geospatial Consortium.  There is a document
describing it via http://www.opengeospatial.org/standards/bp (look for
"GeoPDF"; click-through but free-looking license required).  Basically
it provides a method to associate positions in the document with
latitude-longitude positions on the ground.

The method used is to define "map frames" that are added to the parent
PDF page object.  As I understand it, there is a new key 'LGIDict' in
the page object, which is an array of dictionaries one per map frame,
each of which contains a set of entries like containing matrices,
bounding boxes etc. that define the geolocation for that frame.

My hope is that it would be possible to write a small program that
would open the PDF file and iterate through the pages, dumping this
data as it is found.  I have had a look at the pdfinfo program which
seems to be doing something similar for other metadata.

Would anyone be able to help me with this?  I am a competent C++ coder
but have never had to understand much about how PDF works.  Is starting
with pdfinfo sane?  How do I access the page objects?  Is there a way
to iterate through the entries in a dictionary?

Or, maybe there is already some other tool that will do this?

Here's an example of a GeoPDF file:  ftp://ftp2.cits.rncan.gc.ca/pub/cantopo/50k_pdf/092/g/cantopo_092g06_pdf.zip


Many thanks for any advice.


Regards,  Phil.






More information about the poppler mailing list