[poppler] PdfToCairo question

William Bader williambader at hotmail.com
Wed Apr 15 14:16:16 PDT 2015


If you are mainly worried about file size, you could try running both pdftocairo and ps2pdf and taking the smaller file.
When I run pdffonts on the original, the output of ps2pdf and the output of pdftocairo, the PDF produced by pdftocairo has two additional fonts, HelveticaNeueLTStd-MdCn as Type 0C and HelveticaNeue-Light a second time as Type 0C.
pdftotext gives almost the same output for the original PDF and for the pdftocairo output (except for "fi" ligatures), so the size difference is probably not due to rasterized or vectorized text.
Running pdftocairo on each page (using the -f and -l options) gives the page sizes 6.8 MB, 1.5 MB, 1.2 MB, and 1.5 MB, so the problem is probably on the first page.
ghostscript 9.16 ps2pdf makes the original 4 page document into a 1.2 MB pdf.
It is not necessary to run pdftops or pdf2ps first before running ps2pdf.

You can run ps2pdf directly on the PDF.
You can also process a mix of PS and PDF files or add additional options if you use a longer command like
gs -dCompatibilityLevel=1.4 -sPAPERSIZE=a4 -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf -c .setpdfwrite -f *.ps *.pdf
gs also supports a number of distiller options to control how it changes the resolution of bitmapped images and how it handles fonts and colors.
http://www.ghostscript.com/doc/9.16/Ps2pdf.htm
William

Date: Wed, 15 Apr 2015 11:43:07 -0400
From: matthew.hartley.jones at gmail.com
To: poppler at lists.freedesktop.org
Subject: [poppler] PdfToCairo question

Occasionally, pdftocairo creates an output pdf that is 5 to 10 times larger than the input pdf. That causes processing time and file transfer time issues for us when the pdf was already several megabytes in size. I’m a link to an example pdf that starts off as 2 MB but balloons out to 10 MB after running through pdftocairo. I've done some research on the cairo format and examined the output using the Acrobat 9 filesize audit tool, and I think a lot of the extra size is coming from the vectorized version of some very complex images. I am using poppler-utils version 0.18.4 on Ubuntu 12.04.	Is there any workaround you can suggest that would decrease the filesize without visible drops in quality? It’s not a requirement that the output be in the cairo format, so a solution may involve somehow undoing the vectorization. I've gotten some success with running "pdftops" then "ps2pdf" on the document before running "pdftocairo", but I don't understand why the resulting document is smaller and if it is actually solving my problem.
Original 2MB pdf: https://dl.dropboxusercontent.com/u/11610849/Lenovo%20x220%20-%20datasheet.pdfThe same PDF, after going through pdftocairo -pdf, not 10MB: https://dl.dropboxusercontent.com/u/11610849/AFTER_PDFTOCAIRO%20-%20Lenono%20x220%20-%20datasheet.pdf
Any input would be helpful, thanks for your time.
Matt Jones

_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20150415/e2cab0b3/attachment.html>


More information about the poppler mailing list