[poppler] Retrieve all objects from a PDF file

Wed Nov 2 06:09:16 PDT 2011

I tried out Poppler 13 from Ubuntu 11.10 and I get the same results. As
far as I understand, if I look for an object in XRef using fetch(), and
that object is in an object stream, the XRef then uncompresses the
object and returns it to me, so that I don't even know that it was
compressed in the first place? If things don't work this way, what
approach should I take? 

That being said, I tried this approach with both Poppler 7 and 13 and
two PDF files with object streams. When I do an XRef->fetch() with
generation number 0 and object number of an object in the object stream,
I get a null object for all objects except the first one that is packed
in the object stream. The first one isn't extracted fully. Is this a
known issue? 

Nedim

On Mon, 2011-10-31 at 11:12 -0700, Josh Richardson wrote:
> What kinds of objects are you interested in?  I have a version of
> pdftohtml which I believe is not yet merged into the master repo that
> extracts images and fonts.
> 
> --josh
> 
> On 10/31/11 9:16 AM, "Nedim Srndic" <nedim.sh at gmail.com> wrote:
> 
> >Dear list, 
> >
> >I am using the Poppler library (in the src/poppler folder, no bindings,
> >version 7 from the Ubuntu 10.10 repos) and would like to retrieve all
> >objects from a PDF file. Currently, I am running a loop on XRef and
> >getting all the non-null objects from it, but it doesn't seem to
> >retrieve objects from object streams. What solution would you propose
> >for this problem?
> >
> >Thanks, 
> >Nedim Srndic
> >
> >_______________________________________________
> >poppler mailing list
> >poppler at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/poppler
> >
>