[poppler] Retrieve all objects from a PDF file

Albert Astals Cid aacid at kde.org
Fri Nov 25 10:28:40 PST 2011


El Divendres, 25 de novembre de 2011, a les 09:18:18, Nedim Srndic va 
escriure:
> On Thu, 2011-11-24 at 18:48 +0100, Albert Astals Cid wrote:
> > El Dijous, 24 de novembre de 2011, a les 09:46:51, vau escriure:
> > > On Fri, 2011-11-04 at 13:15 +0100, Albert Astals Cid wrote:
> > > > A Divendres, 4 de novembre de 2011, Nedim Srndic vàreu escriure:
> > > > > What is the preferred way to retrieve an indirect object
> > > > > from an
> > > > > object
> > > > > stream?
> > > > 
> > > > ObjectStream::getObject ?
> > > 
> > > But the ObjectStream class is not publicly accessible.
> > > 
> > > If I know that there is an object with number X in an Object Stream,
> > > and the XRef returns null when I query for it, is that a bug? If
> > > not, how can I get it?
> > 
> > What about debuggint the code, this is a developers list, so if you face
> > something you think it is a bug, you debug it and if you do not have the
> > knowledge to debug it, you file a bug and give a pdf to test, but saying
> > "XRef returns null when I query for it" is not enough, you don't say
> > which code you use, you don't give a PDF, what do you expect us to do?
> > 
> > Albert
> 
> Yes, this is a developers list, but I didn't find a users mailing list

Because there is no users mailing list.

> and did not want to use IRC because somebody may have the same question
> later. I did not want to file a bug because most projects encourage
> users to first discuss their problem before submitting a bug report. I
> did say which code I use and I described the problem as best (and
> shortest) as I could in the very first email, sent almost one month ago.

You mean "I am running a loop on XRef and getting all the non-null objects 
from it" is your code? That's not what i call "code", but fair enough.

> I will interpret your answer as an invitation to submit a bug report. 

Yes please, I'd prefer you to debug the code and maybe submit a patch if you 
really find the problem, but if you are not going to do it, file a bug report 
with the smallest chunk of code that reproduces your problem and a pdf file to 
reproduce it.

> I hope Poppler gets useful documentation, typical usage examples and more
> manpower in the future.

That's not going to happen, almost everyone (you included) is interested in 
just becoming an user of the library, but when you ask people to try to debug 
their problem they shy away. 

Cheers,
  Albert

> 
> Greetings,
> Nedim
> 
> > > Greetings,
> > > Nedim
> > > 
> > > > > Is it possible that I have found a bug? This is really
> > > > > important
> > > > > for me.
> > > > 
> > > > Albert
> > > > 
> > > > > Nedim
> > > > > 
> > > > > On Wed, 2011-11-02 at 14:09 +0100, Nedim Srndic wrote:
> > > > > > I tried out Poppler 13 from Ubuntu 11.10 and I get the
> > > > > > same
> > > > > > results. As far as I understand, if I look for an object
> > > > > > in
> > > > > > XRef using fetch(), and that object is in an object
> > > > > > stream, the
> > > > > > XRef then uncompresses the object and returns it to me,
> > > > > > so that
> > > > > > I don't even know that it was compressed in the first
> > > > > > place? If
> > > > > > things don't work this way, what approach should I take?
> > > > > > 
> > > > > > That being said, I tried this approach with both Poppler
> > > > > > 7 and
> > > > > > 13 and
> > > > > > two PDF files with object streams. When I do an
> > > > > > XRef->fetch()
> > > > > > with
> > > > > > generation number 0 and object number of an object in
> > > > > > the object
> > > > > > stream, I get a null object for all objects except the
> > > > > > first
> > > > > > one that is packed in the object stream. The first one
> > > > > > isn't
> > > > > > extracted fully. Is this a known issue?
> > > > > > 
> > > > > > Nedim
> > > > > > 
> > > > > > On Mon, 2011-10-31 at 11:12 -0700, Josh Richardson wrote:
> > > > > > > What kinds of objects are you interested in?  I have
> > > > > > > a
> > > > > > > version of
> > > > > > > pdftohtml which I believe is not yet merged into the
> > > > > > > master
> > > > > > > repo
> > > > > > > that
> > > > > > > extracts images and fonts.
> > > > > > > 
> > > > > > > --josh
> > > > > > > 
> > > > > > > On 10/31/11 9:16 AM, "Nedim Srndic" <nedim.sh at gmail.com> wrote:
> > > > > > > >Dear list,
> > > > > > > >
> > > > > > > >I am using the Poppler library (in the src/poppler
> > > > > > > >folder,
> > > > > > > >no
> > > > > > > >bindings, version 7 from the Ubuntu 10.10 repos)
> > > > > > > >and would
> > > > > > > >like
> > > > > > > >to retrieve all objects from a PDF file.
> > > > > > > >Currently, I am
> > > > > > > >running
> > > > > > > >a loop on XRef and getting all the non-null
> > > > > > > >objects from
> > > > > > > >it, but
> > > > > > > >it doesn't seem to retrieve objects from object
> > > > > > > >streams.
> > > > > > > >What
> > > > > > > >solution would you propose for this problem?
> > > > > > > >
> > > > > > > >Thanks,
> > > > > > > >Nedim Srndic
> > > > > > > >
> > > > > > > >_______________________________________________
> > > > > > > >poppler mailing list
> > > > > > > >poppler at lists.freedesktop.org
> > > > > > > >http://lists.freedesktop.org/mailman/listinfo/popp
> > > > > > > >ler
> > > > > 
> > > > > _______________________________________________
> > > > > poppler mailing list
> > > > > poppler at lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/poppler
> > > > 
> > > > _______________________________________________
> > > > poppler mailing list
> > > > poppler at lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list