[poppler] [PATCH] Catalog::getNumPages(): validate page count

Leonard Rosenthol lrosenth at adobe.com
Tue Sep 8 06:52:44 PDT 2015


Acrobat/Reader have supported 64bit integers for close to a decade now, long before we actually delivered a 64bit app (which we just did on the Mac with Acrobat DC).

As for millions of pages in a PDF - I’ve never seen 8 million but I have seen a (real world!) 2 million pager.

Leonard




On 9/8/15, 9:33 AM, "poppler on behalf of Even Rouault" <poppler-bounces at lists.freedesktop.org on behalf of even.rouault at spatialys.com> wrote:

>Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
>> On 08/09/15 21:06, Even Rouault wrote:
>> > Hi,
>> > 
>> > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
>> > to crash even if we call it with a low page number.
>> > 
>> > Even
>> >
>> >+      // to avoid too huge memory allocations layer and avoid crashes
>> >+      // This is the maximum number of indirect objects as per
>> >
>> > ISO-32000:2008 (Table C-1)
>> 
>> Table C-1 is a list of minimum limits for 32-bit readers.
>
>Ah indeed. But they also state "Because Acrobat implementations are subject to 
>these limits, applications producing PDF files are strongly advised to remain 
>within them", so that might make sense to check that (even if Acrobat goes 
>64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
>sense ?)
>
>> 
>> >+      // We could probably decrease that number again. PDFium for
>> >example uses 1 Mi
>> >+      else if (numPages > 8 * 1024 * 1024) {
>> >+        error(errSyntaxWarning, -1,
>> >+              "Page count ({0:d}) too big. Limiting number of
>> >
>> > reported pages to 8 Mi",
>> >
>> >+              numPages);
>> 
>> Instead of imposing an arbitrary limit we should just add a check for
>> gmallocn() returning NULL and print an error.
>
>That would be another possibility. Just looked a bit more complicated to do it 
>right and not leak memory for someone not familiar with the code base.
>
>> 
>> For broken PDFs that report an invalid size (see bug 85140) we could
>> check if the page count exceeds the number of objects in the XRef.
>
>What would be the criterion to decide that a PDF is broken ? Or do you mean we 
>should always check that the reported page count is no bigger than the number 
>of objects in the XRef ? And in that case, should we limit the reported page 
>count to the number of objects in the XRef, or just return 0 with an error ?
>
>> 
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>-- 
>Spatialys - Geospatial professional services
>http://www.spatialys.com
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list