[Libreoffice] ICU bloat ...

Fri Jan 14 04:27:38 PST 2011

Hi there,

On Fri, 2011-01-07 at 12:22 -0600, Norbert Thiebaud wrote:
> >        Which makes me wonder: do we really need everything that is in that
> > beast ? pmap seems to suggest we use 84K out of the 13Mb on Linux:
> 
> Michael: icudata contains, among other things all the supported
> utf16<->other-codepage convertion. If your locale is utf8 or iso8859-1/15
> (which is most likely in your case) then sure you just need one or two
> of these conversion table... if any at all (some convertion like
> utf8<->utf16 are algorithmic)

	Sure. We have a patch for sal:

http://cgit.freedesktop.org/libreoffice/build/tree/patches/dev300/size-sal-textenc.diff

	sadly still not merged, since it needs re-testing on win32 - that chops
a megabyte of this off of sal (exactly the same text encoding conversion
tables).

> libicudata also contains stuff about collation and locales...

	Right - but it also seems that some (much?) of this data is not
actually used :-) AFAICS we don't use the charset conversion data at
all, preferring the sal stuff. There are whole fields of API that are
simply not touched from ICU:

	'ucnv_' (char set conversion !?)
	'ures_'
	'unorm_'
	'utrans_'
	'u_shapeArabic'

	So - I suspect we could hack some big chunks of code, and data out of
this: the data is the biggest evil size-wise from a distribution
perspective I suspect: 5.5Mb compressed of our win32 download is data we
don't use [ one of the bigger lumps of pointlessness there ].

> either way libicudata is big, but there is not that much redundancy in
> it. it just covert and insanely large number of code page (

	sure sure :-) and we don't need that AFAICS, since we don't use the
relevant APIs; and our internal ICU does not have to be a generic useful
resource for abstract programs (particularly on Win32).

	So I added an easy hack here:

http://wiki.documentfoundation.org/Development/Easy_Hacks#de-bloat_internal_ICU

	Thanks,

		Michael.

-- 
 michael.meeks at novell.com  <><, Pseudo Engineer, itinerant idiot