[poppler] Extracting geolocation metadata from a GeoPDF file

Leonard Rosenthol lrosenth at adobe.com
Sun Mar 28 09:16:34 PDT 2010


Be aware that there are TWO SEPARATE "standards" for GIS information in a PDF file - the TerraGo specification that you note below, and the one that the ISO 32000 (aka PDF standard) committee chose for inclusion in PDF 2.0 (32000-2) which is currently implemented in Adobe Acrobat & Reader 9.  You may want/need to support both.

Leonard

-----Original Message-----
From: poppler-bounces at lists.freedesktop.org [mailto:poppler-bounces at lists.freedesktop.org] On Behalf Of Phil Endecott
Sent: Friday, March 26, 2010 1:30 PM
To: poppler at lists.freedesktop.org
Subject: [poppler] Extracting geolocation metadata from a GeoPDF file

Dear Experts,

I need to extract the geolocation metadata from a GeoPDF file.

If you're not familiar with this format, it's something that was
developed by a company called TerraGo Technologies and was adopted as a
"best practice" by the Open Geospatial Consortium.  There is a document
describing it via http://www.opengeospatial.org/standards/bp (look for
"GeoPDF"; click-through but free-looking license required).  Basically
it provides a method to associate positions in the document with
latitude-longitude positions on the ground.

The method used is to define "map frames" that are added to the parent
PDF page object.  As I understand it, there is a new key 'LGIDict' in
the page object, which is an array of dictionaries one per map frame,
each of which contains a set of entries like containing matrices,
bounding boxes etc. that define the geolocation for that frame.

My hope is that it would be possible to write a small program that
would open the PDF file and iterate through the pages, dumping this
data as it is found.  I have had a look at the pdfinfo program which
seems to be doing something similar for other metadata.

Would anyone be able to help me with this?  I am a competent C++ coder
but have never had to understand much about how PDF works.  Is starting
with pdfinfo sane?  How do I access the page objects?  Is there a way
to iterate through the entries in a dictionary?

Or, maybe there is already some other tool that will do this?

Here's an example of a GeoPDF file:  ftp://ftp2.cits.rncan.gc.ca/pub/cantopo/50k_pdf/092/g/cantopo_092g06_pdf.zip


Many thanks for any advice.


Regards,  Phil.




_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list