[poppler] [PATCH] Catalog::getNumPages(): validate page count

Adrian Johnson ajohnson at redneon.com
Tue Sep 8 15:45:36 PDT 2015


On 08/09/15 23:03, Even Rouault wrote:
> Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
>> On 08/09/15 21:06, Even Rouault wrote:
>>> Hi,
>>>
>>> A too huge number may cause the gmallocn() in Catalog::cachePageTree()
>>> to crash even if we call it with a low page number.
>>>
>>> Even
>>>
>>> +      // to avoid too huge memory allocations layer and avoid crashes
>>> +      // This is the maximum number of indirect objects as per
>>>
>>> ISO-32000:2008 (Table C-1)
>>
>> Table C-1 is a list of minimum limits for 32-bit readers.
> 
> Ah indeed. But they also state "Because Acrobat implementations are subject to 
> these limits, applications producing PDF files are strongly advised to remain 
> within them", so that might make sense to check that (even if Acrobat goes 
> 64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
> sense ?)

A page count limit does not make sense. A limit that may be appropriate
for a 64-bit 16GB desktop would not be appropriate on a 32-bit embedded
system with limited memory. Better to just check for the out of memory
condition and report an error.

> 
>>
>>> +      // We could probably decrease that number again. PDFium for
>>> example uses 1 Mi
>>> +      else if (numPages > 8 * 1024 * 1024) {
>>> +        error(errSyntaxWarning, -1,
>>> +              "Page count ({0:d}) too big. Limiting number of
>>>
>>> reported pages to 8 Mi",
>>>
>>> +              numPages);
>>
>> Instead of imposing an arbitrary limit we should just add a check for
>> gmallocn() returning NULL and print an error.
> 
> That would be another possibility. Just looked a bit more complicated to do it 
> right and not leak memory for someone not familiar with the code base.
> 
>>
>> For broken PDFs that report an invalid size (see bug 85140) we could
>> check if the page count exceeds the number of objects in the XRef.
> 
> What would be the criterion to decide that a PDF is broken ? Or do you mean we 
> should always check that the reported page count is no bigger than the number 
> of objects in the XRef ? And in that case, should we limit the reported page 
> count to the number of objects in the XRef, or just return 0 with an error ?

Since you did not provide a sample PDF that demonstrates the problem I
assumed that you have a broken PDF that claims to have a much higher
page count than the actual number of pages. If the PDF is not broken and
really does have more than 8 million pages it makes no sense to limit
the page count as this would prevent machines with sufficient memory
from being able to read the entire PDF.

> 
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list