[poppler] [PATCH] Catalog::getNumPages(): validate page count
Jason Crain
jason at aquaticape.us
Thu Sep 17 09:49:27 PDT 2015
On 2015-09-17 08:57, Leonard Rosenthol wrote:
> While it is unclear in ISO 32000-1 whether such a PDF is invalid, we
> made it clear in 32000-2 that you can only have one copy of each page
> in the Pages tree. So personally, I wouldn’t waste much time on this
> particular file.
>
> Leonard
OK, if it's not allowed by the spec, I have no real objection to the
object count check.
> On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain"
> <poppler-bounces at lists.freedesktop.org on behalf of
> jason at aquaticape.us> wrote:
>
>> On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:
>>> > > I don't know of a good way to validate the page count. Even
>>> > > going through the page tree might be hard to do right without
>>> > > leading to an infinite loop, in addition to being slow.
>>> >
>>> > Catalog::cachePageTree goes over the tree, but i agree doing that
>>> > to calculate the num of pages can be meh.
>>>
>>> If the number of pages is huge, the PDF might be intentionally
>>> corrupted to provoke a bug in a particular PDF viewer, and other
>>> data structures could be subtly corrupted as well. Any scan would
>>> have to proceed very cautiously.
>>>
>>> If there is a minimum number of objects required for a page, and if
>>> the total number of objects is easy to find, could poppler
>>> immediately reject files with (total num objects) / (min objects per
>>> page) < page count?
>>
>> The document at
>> https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
>> I'm thinking of. It has 5 objects and a single page that is listed in
>> the /Kids array 10 times. Duplicating the page just means adding it
>> to the array again and incrementing /Count. If we want this document
>> to work then there's really no minimum number of objects required for
>> a page. Otherwise, each page would require at least a /Page object.
>>
>> FWIW Adobe Reader shows an error on the document after the first
>> duplicated page. Other viewers show it just fine.
More information about the poppler
mailing list