[Libreoffice] libicudata ICU data info

Eike Rathke erack at redhat.com
Thu Dec 15 14:44:29 PST 2011


Hi Mike,

On Wednesday, 2011-12-14 07:48:49 -0800, Mike Whiteley wrote:

> The current libicudata is about 15MB, but it doesn't need
> to be that large.  We can probably get it down to 5MB
> or less. I took a bit of time on this.
> 
> 1) The configuration files needed to modify what is actually
> in icudata are NOT included in the package in libreoffice's repository.
> A real "source" package can be downloaded, which is what we'll
> need to do if we want to customize this library.

There's an easier way by using the ICU data library customizer available
at http://apps.icu-project.org/datacustom/


> 2) The bulk of the data in (our non-source package) icudata
> comes from an input file ./source/data/in/icudt44l.dat.  This
> file can be removed which causes the resulting icudata library to
> go from 15MB to 4.4MB.

The configurable data lib is 13945 KB in size. We can't remove that in
its entirety. We need at least, numbers taken from ICU 4.4
http://apps.icu-project.org/datacustom/ICUData44.html
* Break Iterator (534 KB)
* Collators (4830 KB)
* Transliterators (308 KB)

From "Miscellaneous Data (4282 KB)" we'd only need parts of.

Quite safe to remove are currently
* Charset Mapping Tables (3469 KB)
* Rule Based Number Format (275 KB)
* Formatting, Display Names and Other Localized Data (572 KB)

The resulting lib would be 10197 KB (including miscellaneous)
respectively 5916 KB (miscellaneous removed). So we could gain between
~4MB and ~8MB. Given that only systems where ICU doesn't already exist
(Windows and?) would benefit from this it's surely a benevolent task for
some merciful soul ;-)


> HELP:  We should try this solution first.  Will someone please
> who knows more about icudata see if a library build this way
> is enough for what libreoffice needs?

The only way to be sure is to exchange the library in the build
environment, build, run and test.


> 3) There are two icudata packages in my repository.  Probably
> one of them can be deleted (also, these are not pure source
> packages anyways).

Which repository? If the external sources tarballs downloaded during
build, then one probably can be removed, at least LibO 3.4 and later use
ICU 4.4.2, if built internally at all.


> 5) Keep in mind my knowledge of icudata is still very limited,
> and this information is only from me reading their web pages for
> 20 min, and me messing with code for another 20 minutes.

With good results :-)

> Anyways, I just thought this would be helpful.

Sure, thanks. Would be nice if you could try out stripped-down versions
of the library and report back the results.

  Eike

-- 
LibreOffice Calc developer. Number formatter stricken i18n transpositionizer.
GnuPG key 0x293C05FD : 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20111215/09c316e4/attachment.pgp>


More information about the LibreOffice mailing list