[poppler] Retrieve all objects from a PDF file

Leonard Rosenthol lrosenth at adobe.com
Tue Nov 1 05:26:09 PDT 2011


Why would you iterate over the objects w/o any understanding of their
context?  Wouldn't it make MUCH MORE sense to "walk the tree" - starting
at the Catalog/Root and then simply recursing down the object tree based
on known relationships?

What use are the objects w/o context?

Leonard

On 11/1/11 7:55 AM, "Nedim Srndic" <nedim.sh at gmail.com> wrote:

>I'm sorry, I see now that I wasn't clear enough. I would like to
>enumerate every PDF dictionary from a given PDF file, including but not
>limited to the Catalog, Pages, Actions, Annotations, Name tree -
>everything. Currently I can successfully do that for all dictionaries
>that can be located using XRef, but it seems that indirect objects
>inside object streams cannot be found this way. I could obviously test
>if any of the objects pointed to by the XRef is an object stream and get
>all the objects from the stream, but I'm wondering if Poppler has a more
>elegant solution. 
>
>Nedim
>
>On Mon, 2011-10-31 at 11:12 -0700, Josh Richardson wrote:
>> What kinds of objects are you interested in?  I have a version of
>> pdftohtml which I believe is not yet merged into the master repo that
>> extracts images and fonts.
>> 
>> --josh
>> 
>> On 10/31/11 9:16 AM, "Nedim Srndic" <nedim.sh at gmail.com> wrote:
>> 
>> >Dear list, 
>> >
>> >I am using the Poppler library (in the src/poppler folder, no bindings,
>> >version 7 from the Ubuntu 10.10 repos) and would like to retrieve all
>> >objects from a PDF file. Currently, I am running a loop on XRef and
>> >getting all the non-null objects from it, but it doesn't seem to
>> >retrieve objects from object streams. What solution would you propose
>> >for this problem?
>> >
>> >Thanks, 
>> >Nedim Srndic
>> >
>> >_______________________________________________
>> >poppler mailing list
>> >poppler at lists.freedesktop.org
>> >http://lists.freedesktop.org/mailman/listinfo/poppler
>> >
>> 
>
>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list