[poppler] Extracting geolocation metadata from a GeoPDF file

Phil Endecott spam_from_poppler at chezphil.org
Sat Mar 27 12:24:32 PDT 2010


Phil Endecott wrote:
> I need to extract the geolocation metadata from a GeoPDF file.
>
> If you're not familiar with this format, it's something that was
> developed by a company called TerraGo Technologies and was adopted as a
> "best practice" by the Open Geospatial Consortium.  There is a document
> describing it via http://www.opengeospatial.org/standards/bp (look for
> "GeoPDF"; click-through but free-looking license required).  Basically
> it provides a method to associate positions in the document with
> latitude-longitude positions on the ground.
>
> The method used is to define "map frames" that are added to the parent
> PDF page object.  As I understand it, there is a new key 'LGIDict' in
> the page object, which is an array of dictionaries one per map frame,
> each of which contains a set of entries like containing matrices,
> bounding boxes etc. that define the geolocation for that frame.

Well I have cobbled together something that gets the page objects and I 
can see the 'LGIDict' that I was looking for.  I had to hack Page.h to 
make the pageObj public; is there some better way that I should have 
been doing that?

I was confused for a while because the example GeoPDF that I linked to before:

> Here's an example of a GeoPDF file:
> ftp://ftp2.cits.rncan.gc.ca/pub/cantopo/50k_pdf/092/g/cantopo_092g06_pdf.zip

seems to not have the LGIDict, nor anything else additional in the page 
object.  Of course I assumed by code was wrong for a long time before 
suspecting the file.  A test with another file got the LGIDict immediately.

To help with debugging, I was wondering if there is any easy way to 
decode the PDF to a point where I can see its unencoded text (or 
however one describes this first level of encoding).  Specifically the 
GeoPDF document describes stuff at this level:

105 0 obj
<<
     /Type /Page
     /LGIDict 104 0 R
......
>>
endobj
104 0 obj
<<
     /Type /LGIDict
     /Version (2.1)
     /CTM
.......


What do I have to do to see that?


Regards,

Phil.






More information about the poppler mailing list