[poppler] Big image splash support

Martin (gzlist) gzlist at googlemail.com
Thu Aug 20 11:43:06 UTC 2020


I have a patchset to support writing images with more than 2GiB pixel
data using the splash backend, which raises a few questions worth
running past this list.

Rendering a pdf (using for instance the pdftoppm utility) allocates a
bitmap to draw into and assemble the output. The bitmap dimensions and
coordinates are of type int, and the product is also treated as an
int, both in the gmem allocation, and in drawing logic. On an LP64
system, this means the max image size is under 27500x27500 and quite a
bit less than could fit in memory.

Large outputs result is this error:

    $ pdftoppm -scale-to 32768 -png ~/sample.ai > /tmp/out.png
    Bogus memory allocation size
    $ file /tmp/out.png
    /tmp/out.png: PNG image data, 1 x 1, 8-bit/color RGB, non-interlaced

The logged line is from the allocator level check, and after there is
a slightly surprising fallback to 1 pixel square output. From looking
at other [allocator errors] this probably exists so  *some* output is
produced for documents with only some bogus content?

There is an argument that all widths, heights, and coordinates should
be size_t instead of int, but to limit the size and pain of the
change, keeping the current interfaces using int seems reasonable.

So two distinct changes to make:
* Allocations larger than 2GiB must be allowed
* Buffer offsets must use a type larger than int
Also, optionally:
* Handle failed allocations a little better

For allocation, what I have currently is (A) change use of int in
gmem.c to long so all allocations can be >2GiB on 64-bit machines.
Alternatively I could (B) find just the uses of the allocator that may
need large areas and make them use a new gmem function.

Do you prefer (A) or (B) here? Given existing [pdf deflate bombs]
there's probably not much DoS protection provided by the 2GiB limit?

For the splash code, what I've implemented and propose landing is
casting to size_t on index into a buffer, generally where
multiplication happens. The tradeoff is it's harder to be sure of
correctness - my method of finding where to make changes was a
combination of grep, and gdb with real world inputs.

Is this acceptable? It's possible this method will introduce memory
safety issues specifically for large output sizes.

Note the [cairo max size] is explicitly (1<<15)-1 for surface
dimensions, and not addressed by my changes.

Martin


[allocator errors]: Some problems currently blocked at gmalloc level?
https://gitlab.freedesktop.org/poppler/poppler/-/issues/532
https://gitlab.freedesktop.org/poppler/poppler/-/issues/825

[pdf deflate bombs]: PDF Deflate bombs may cause crashes or resource exhaustion
https://gitlab.freedesktop.org/poppler/poppler/-/issues/878

[cairo max size]: Limits for the cairo backed
https://gitlab.freedesktop.org/cairo/cairo/-/blob/1.16/src/cairo-image-surface.c#L59
There's been some work in pixman, but it's incomplete?
https://gitlab.freedesktop.org/pixman/pixman/-/issues/6


More information about the poppler mailing list