[poppler] Extracting geolocation metadata from a GeoPDF file
Leonard Rosenthol
lrosenth at adobe.com
Sun Mar 28 09:16:34 PDT 2010
Be aware that there are TWO SEPARATE "standards" for GIS information in a PDF file - the TerraGo specification that you note below, and the one that the ISO 32000 (aka PDF standard) committee chose for inclusion in PDF 2.0 (32000-2) which is currently implemented in Adobe Acrobat & Reader 9. You may want/need to support both.
Leonard
-----Original Message-----
From: poppler-bounces at lists.freedesktop.org [mailto:poppler-bounces at lists.freedesktop.org] On Behalf Of Phil Endecott
Sent: Friday, March 26, 2010 1:30 PM
To: poppler at lists.freedesktop.org
Subject: [poppler] Extracting geolocation metadata from a GeoPDF file
Dear Experts,
I need to extract the geolocation metadata from a GeoPDF file.
If you're not familiar with this format, it's something that was
developed by a company called TerraGo Technologies and was adopted as a
"best practice" by the Open Geospatial Consortium. There is a document
describing it via http://www.opengeospatial.org/standards/bp (look for
"GeoPDF"; click-through but free-looking license required). Basically
it provides a method to associate positions in the document with
latitude-longitude positions on the ground.
The method used is to define "map frames" that are added to the parent
PDF page object. As I understand it, there is a new key 'LGIDict' in
the page object, which is an array of dictionaries one per map frame,
each of which contains a set of entries like containing matrices,
bounding boxes etc. that define the geolocation for that frame.
My hope is that it would be possible to write a small program that
would open the PDF file and iterate through the pages, dumping this
data as it is found. I have had a look at the pdfinfo program which
seems to be doing something similar for other metadata.
Would anyone be able to help me with this? I am a competent C++ coder
but have never had to understand much about how PDF works. Is starting
with pdfinfo sane? How do I access the page objects? Is there a way
to iterate through the entries in a dictionary?
Or, maybe there is already some other tool that will do this?
Here's an example of a GeoPDF file: ftp://ftp2.cits.rncan.gc.ca/pub/cantopo/50k_pdf/092/g/cantopo_092g06_pdf.zip
Many thanks for any advice.
Regards, Phil.
_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list