[poppler] Reading Meta Information from PDF

Mathieu Malaterre mathieu.malaterre at gmail.com
Sun Apr 11 00:59:21 PDT 2010


On Sun, Apr 11, 2010 at 3:42 AM, Brad Hards <bradh at frogmouth.net> wrote:
> On Thursday 08 April 2010 08:35:15 pm Mathieu Malaterre wrote:
>>   This is slightly of topic to poppler. I am looking for a way to read
>> the Meta Information of a PDF file (basically the output of pdfinfo).
> This isn't a lot of context to work with, so I'm guessing what might work for
> you.
>> I find it a little bit cumbersome to integrate poppler (license issue,
>> no real need for a full rendering PDF library). Could someone suggest
>> another solution for reading those Meta Information from PDF files ?
> If you don't want to use poppler / pdfinfo, you could buy the adobe libraries,
> or you could try pdftk. Podofo may also be a possibility.

I should have mention this is for integration in an open source/ cross
platform toolkit with BSD license. For now I use tricks to link to
private header of -system installed- poppler (due to API changes). But
I still lack a PDF parser for Win32 platforms.

>>   Will a simple regex (such as: "<rdf:RDF.*</rdf:RDF>)") works ?
> I do not think this will work in general. It might work for all the PDF files
> you care about though. Read the PDF specification (Section 10.2.2 or
> thereabouts) for information on the metadata stream(s).

If I find some time, I might get started with this python parser I
found on the net:

http://blog.didierstevens.com/programs/pdf-tools/

It is self contained, and is exactly focus on what I am looking for a
stream interface (what is SAX to XML people) for PDF people.

Thanks anyway,
-- 
Mathieu


More information about the poppler mailing list