[poppler] [PATCH] Catalog::getNumPages(): validate page count

Jason Crain jason at aquaticape.us
Wed Sep 16 22:04:12 PDT 2015


On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:
> > > I don't know of a good way to validate the page count. Even
> > > going through the page tree might be hard to do right without
> > > leading to an infinite loop, in addition to being slow.
> >
> > Catalog::cachePageTree goes over the tree, but i agree doing that
> > to calculate the num of pages can be meh.
> 
> If the number of pages is huge, the PDF might be intentionally
> corrupted to provoke a bug in a particular PDF viewer, and other
> data structures could be subtly corrupted as well. Any scan would
> have to proceed very cautiously.
> 
> If there is a minimum number of objects required for a page, and if
> the total number of objects is easy to find, could poppler
> immediately reject files with (total num objects) / (min objects per
> page) < page count?

The document at
https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
I'm thinking of.  It has 5 objects and a single page that is listed in
the /Kids array 10 times.  Duplicating the page just means adding it
to the array again and incrementing /Count.  If we want this document
to work then there's really no minimum number of objects required for
a page.  Otherwise, each page would require at least a /Page object.

FWIW Adobe Reader shows an error on the document after the first
duplicated page.  Other viewers show it just fine.


More information about the poppler mailing list