[poppler] pdftops creates huge file with simple color background (attached examples)

Adrian Johnson ajohnson at redneon.com
Fri Jan 29 14:29:57 PST 2016


On 30/01/16 05:36, Pierre-Luc Samuel wrote:
> Hum, yeah RunLengthDecode doesn't seem to be the best algorithm for this
> kind of image.  Well, it's not really a good compression algorithm at
> all from what I see!
> 
> An interesting fact I found was that if I pass my 27 mb file to ps2ps
> (ghostscript ps2write device), I end up with a 1.7 MB file that is
> "/ASCII85Decode filter /LZWDecode filter".  I don't know much about
> these decoding algorithms, but it would be really nice if that kind of
> post-compression happened directly in poppler's pdftops.
> 
> I'd be willing to help if someone helped me figure it out.  I see
> poppler already has a LZWStream class, would it simply be a matter of
> pluging it in somewhere in PSOutputDev.cc, in place or in addition to
> RunLengthDecode?

"pdftocairo -ps tux-yellow.pdf" creates a 112KB file.
"pdftocairo -ps -level2 tux-yellow.pdf" creates a 345KB file.

So you should be able to get significantly better compression out of
pdftops by using the /FlateDecode filter for PS level 3 and fallback to
/LZWDecode for level 2.

> 
> Pierre-Luc
> 
> On 01/27/2016 01:55 PM, William Bader wrote:
>> tux-yellow and tux-white both convert to a 2549x3299 RGB bitmap that
>> is RunLength compressed and ASCII85 encoded.
>>
>> The yellow file is larger than the white file because "255 194 14"
>> does not compress as well as "255 255 255".
>>
>> The original tux image was Flate encoded with /DecodeParms of
>> <</Predictor 15/Columns 512>>
>>
>> I am not a poppler maintainer, but I think that it should be possible
>> to add an option to do Flate compression.
>>
>> If you want to look at the code, open poppler/PSOutputDev.cc and
>> search for occurrences of /RunLengthDecode
>>
>> The "nothing" files are small because they paint the background by
>> drawing a box instead of by copying a bitmapped image.
>>
>> I think that when a PDF has several images on top of each other,
>> pdftops needs to convert the entire area to a bitmap even if some of
>> the parts were originally drawn with vector commands. The original
>> images have a bitmapped tux over a vector background, but pdftops
>> can't separate them and has to rasterize the entire page.
>>
>> Regards,
>>
>> William
>>
>>
>> To: poppler at lists.freedesktop.org
>> From: Pierre-Luc.Samuel at ticketmaster.com
>> Date: Tue, 26 Jan 2016 14:19:17 -0500
>> Subject: [poppler] pdftops creates huge file with simple color
>> background (attached examples)
>>
>> Hi poppler team,
>>  
>> I have an issue with pdftops version 0.39.0 with conversion of some 
>> specific templates to postscript.  I have created very simple use cases 
>> so that you can understand the issue.
>>  
>> pdftops tux-white.pdf
>> pdftops tux-yellow.pdf
>> ls -al *.ps
>> -rw-r--r-- 1   2816703 Jan 26 11:53 tux-white.ps
>> -rw-r--r-- 1  27576263 Jan 26 11:53 tux-yellow.ps
>>  
>> The size of the second PS is 27MB, but only the background color has 
>> changed.  This seems related to the fact that there is an image on the 
>> template, because if I remove the image, there is no significant size 
>> difference:
>>  
>> pdftops nothing-white.pdf
>> pdftops nothing-yellow.pdf
>> ls -al *.ps
>> -rw-r--r-- 1     11129 Jan 26 10:34 nothing-white.ps
>> -rw-r--r-- 1     11167 Jan 26 10:34 nothing-yellow.ps
>>  
>> Is this a known issue?
>>  
>> Thanks!
>> Pierre-Luc
>>
>> _______________________________________________ poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list