[Libreoffice] ICU bloat ...

Samphan Raruenrom samphan at osdev.co.th
Sat Jan 15 19:07:33 PST 2011


You can easily minimize the ICU footprint by tweaking the ICU build options.
See http://userguide.icu-project.org/packaging
<http://userguide.icu-project.org/packaging>The result will be like this -
http://site.icu-project.org/charts/icu4c-footprint
 <http://site.icu-project.org/charts/icu4c-footprint>

On Fri, Jan 14, 2011 at 7:27 PM, Michael Meeks <michael.meeks at novell.com>wrote:

> Hi there,
>
> On Fri, 2011-01-07 at 12:22 -0600, Norbert Thiebaud wrote:
> > >        Which makes me wonder: do we really need everything that is in
> that
> > > beast ? pmap seems to suggest we use 84K out of the 13Mb on Linux:
> >
> > Michael: icudata contains, among other things all the supported
> > utf16<->other-codepage convertion. If your locale is utf8 or iso8859-1/15
> > (which is most likely in your case) then sure you just need one or two
> > of these conversion table... if any at all (some convertion like
> > utf8<->utf16 are algorithmic)
>
>        Sure. We have a patch for sal:
>
>
> http://cgit.freedesktop.org/libreoffice/build/tree/patches/dev300/size-sal-textenc.diff
>
>        sadly still not merged, since it needs re-testing on win32 - that
> chops
> a megabyte of this off of sal (exactly the same text encoding conversion
> tables).
>
> > libicudata also contains stuff about collation and locales...
>
>        Right - but it also seems that some (much?) of this data is not
> actually used :-) AFAICS we don't use the charset conversion data at
> all, preferring the sal stuff. There are whole fields of API that are
> simply not touched from ICU:
>
>        'ucnv_' (char set conversion !?)
>        'ures_'
>        'unorm_'
>        'utrans_'
>        'u_shapeArabic'
>
>        So - I suspect we could hack some big chunks of code, and data out
> of
> this: the data is the biggest evil size-wise from a distribution
> perspective I suspect: 5.5Mb compressed of our win32 download is data we
> don't use [ one of the bigger lumps of pointlessness there ].
>
> > either way libicudata is big, but there is not that much redundancy in
> > it. it just covert and insanely large number of code page (
>
>        sure sure :-) and we don't need that AFAICS, since we don't use the
> relevant APIs; and our internal ICU does not have to be a generic useful
> resource for abstract programs (particularly on Win32).
>
>        So I added an easy hack here:
>
>
> http://wiki.documentfoundation.org/Development/Easy_Hacks#de-bloat_internal_ICU
>
>        Thanks,
>
>                Michael.
>
> --
>  michael.meeks at novell.com  <><, Pseudo Engineer, itinerant idiot
>
>
> _______________________________________________
> LibreOffice mailing list
> LibreOffice at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/libreoffice
>



-- 
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110116/1b1fd907/attachment-0001.htm>


More information about the LibreOffice mailing list