[cairo] Size of PDF with lots of images
simon.sapin at exyr.org
Thu Jan 16 05:54:26 PST 2014
On 16/01/2014 03:44, Behdad Esfahbod wrote:
> Back in 2007 Carl and I developed Slippy a GUADEC to do our cairo slides. I
> have used it since for many presentations. It's a pycairo-based tool where
> you express slides as Python functions. It's very handy, specially if you
> want to use cairo drawing in your slides.
> Back in the days, if I had a huge background image, it was replicated in each
> slide, so I was getting, like, 240MB PDFs for a simple presentation.
> Fortunately that has long been fixed.
> Now, for my GLyphy talk , the source images are 14MB , but the generated
> PDF  is 18MB. Does anyone feel like taking a look?
>  http://github.com/behdad/slippy
>  https://vimeo.com/83732058
>  https://github.com/behdad/slippy/tree/master/glyphy
>  http://behdad.org/glyphy_slides.pdf
Cairo’s default way of storing raster images in PDF is raw pixel data
compressed as deflate with zlib’s default compression level .
Even though PNG also uses deflate, PDF’s encoding is not PNG so the
images are decompressed and re-compressed. I’m not too surprised to see
the size increase. You could try a build of cairo that uses zlib’s
maximum compression level and see what happens. Of course, this is a
compromise with compression speed. Maybe it’s worth adding API to change
If your images were in a format that the PDF backend supports  (which
includes JPEG but not PNG), you could use cairo_surface_set_mime_data()
to have cairo store the original image data (almost) as-is in PDF,
without re-compressing. Although I expect that lossy JPEG may not look
nice for these specific images.
pycairo does not support Surface.set_mime_data(), but cairocffi does
. It also includes some glue code to load images (including JPEG)
into an ImageSurface, using GDK-PixBuf .
More information about the cairo