[poppler] Extracting geolocation metadata from a GeoPDF file
Phil Endecott
spam_from_poppler at chezphil.org
Fri Mar 26 10:29:45 PDT 2010
Dear Experts,
I need to extract the geolocation metadata from a GeoPDF file.
If you're not familiar with this format, it's something that was
developed by a company called TerraGo Technologies and was adopted as a
"best practice" by the Open Geospatial Consortium. There is a document
describing it via http://www.opengeospatial.org/standards/bp (look for
"GeoPDF"; click-through but free-looking license required). Basically
it provides a method to associate positions in the document with
latitude-longitude positions on the ground.
The method used is to define "map frames" that are added to the parent
PDF page object. As I understand it, there is a new key 'LGIDict' in
the page object, which is an array of dictionaries one per map frame,
each of which contains a set of entries like containing matrices,
bounding boxes etc. that define the geolocation for that frame.
My hope is that it would be possible to write a small program that
would open the PDF file and iterate through the pages, dumping this
data as it is found. I have had a look at the pdfinfo program which
seems to be doing something similar for other metadata.
Would anyone be able to help me with this? I am a competent C++ coder
but have never had to understand much about how PDF works. Is starting
with pdfinfo sane? How do I access the page objects? Is there a way
to iterate through the entries in a dictionary?
Or, maybe there is already some other tool that will do this?
Here's an example of a GeoPDF file: ftp://ftp2.cits.rncan.gc.ca/pub/cantopo/50k_pdf/092/g/cantopo_092g06_pdf.zip
Many thanks for any advice.
Regards, Phil.
More information about the poppler
mailing list