[Libreoffice] Windows installer size reduction effort - the compression theory

Mon Feb 28 10:04:10 PST 2011

On 28/02/11 12:16, Michael Meeks wrote:
> Hi Kami,
> 
> 	Wow - I'm so sorry, I missed your sexy mail for a week; that sucks -
> over-much merging on another branch :-) Tor has been on FTO, and
> Fridrich with his head down releasing 3.3.1 (and the Novell LibreOffice
> product) - which in part explains the lack of response to you (and
> Steven) - which sucks - sorry guys.

Finally found a few minutes to try and catch up - so jumping in late
myself ...
> 
> On Tue, 2011-02-22 at 17:32 +0100, Kálmán „KAMI” Szalai wrote:
>> I ran few test to figure out what is the best method for installset
> 
> 	These look great, nice spreadsheet.
> 
>> So I went to other direction, whatif I increase the efficiency of LZM
>> compression of makecab. I found that we can use .Set CompressionMemory=
>> 21 setting. This setting produces 83,91% of original installer size and
> 
> 	Oh - nice :-) that solves the install-space problem as well.

If I understand correctly, imho this is the best approach too.

Note that it's a bad idea to try to compress an already-compressed file.
Depending on the relative efficiency of the two algorithms, it's very
easy for the second compression to actually *increase* the file size. So
compressing the cabs, and just packing them into the download with no
compression makes most theoretical sense.

Plus, in return for minimal or negative compression, the second
compression will also take a LOT of time.
> 
>> In the gray section I tried a special way when I uncompressed every
>> zip container (ODF, JAR, ZIP, etc) in the installset and every file
>> contains only stored data without compression. In this way I was able
>> to gain more 15 MB, but this require zip recompressing at the end of
>> installation process that may make it to complex and time consuming.
>> Please check the attached document, and if you want to go with it
>> apply the attached patches.
> 
> 	Oh ! that sounds fun :-) 15Mb for that. Actually on master - we could
> use flat ODF files and get the same result (for some on-disk size
> growth) without having to re-compress, and they're dead fast in master.
> 
> 	So - personally, I find it amazing that ZIP is better than LZMA - ever,
> it is such an older compression algorithm, and this flies in the face of
> everything I've seen in practise with it [ eg. we use LZMA for our RPMs
> in openSUSE these days AFAIR, better than bzip2, which in turn was
> better than .tgz (essentially zip) ].
> 
Why the surprise? How long has Huffman been around - 30 years? 40? And
you CANNOT do better than Huffman (there are other equally as good
algorithms, though). The holy grail is an algorithm that is as efficient
as Huffman, very quick to decompress, and fairly quick to compress.

Having tried to back up a hard disk over ethernet (dd if=/dev/sda
of=//samba/archive.hdi), it was direly slow over a 10Mb card. So I tried
to pipe it through bzip2 to speed up the transfer - it was *even*
*slower*! Why do you think LZMA is better than zip - because it's
faster? Or because it's more efficient? I don't know much about the
details of either, but I'd be surprised if zip *wasn't* capable of very
good compression. It's just that as you crank up the compression
efficiency, you also crank up the time taken to do the compression -
witness bz2's inability to flood my 10Mbit network card.

(I think we need to add compression time to KAMI's nice table :-)

Cheers,
Wol