[poppler] Retrieve all objects from a PDF file

Nedim Srndic nedim.sh at gmail.com
Tue Nov 1 04:55:22 PDT 2011


I'm sorry, I see now that I wasn't clear enough. I would like to
enumerate every PDF dictionary from a given PDF file, including but not
limited to the Catalog, Pages, Actions, Annotations, Name tree -
everything. Currently I can successfully do that for all dictionaries
that can be located using XRef, but it seems that indirect objects
inside object streams cannot be found this way. I could obviously test
if any of the objects pointed to by the XRef is an object stream and get
all the objects from the stream, but I'm wondering if Poppler has a more
elegant solution. 

Nedim

On Mon, 2011-10-31 at 11:12 -0700, Josh Richardson wrote:
> What kinds of objects are you interested in?  I have a version of
> pdftohtml which I believe is not yet merged into the master repo that
> extracts images and fonts.
> 
> --josh
> 
> On 10/31/11 9:16 AM, "Nedim Srndic" <nedim.sh at gmail.com> wrote:
> 
> >Dear list, 
> >
> >I am using the Poppler library (in the src/poppler folder, no bindings,
> >version 7 from the Ubuntu 10.10 repos) and would like to retrieve all
> >objects from a PDF file. Currently, I am running a loop on XRef and
> >getting all the non-null objects from it, but it doesn't seem to
> >retrieve objects from object streams. What solution would you propose
> >for this problem?
> >
> >Thanks, 
> >Nedim Srndic
> >
> >_______________________________________________
> >poppler mailing list
> >poppler at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/poppler
> >
> 




More information about the poppler mailing list