[Libreoffice] RC4 / Windows size analysis ...

Steve Butler sebutler at gmail.com
Fri Jan 28 02:29:26 PST 2011

Hi Michael,

On 28 January 2011 04:04, Michael Meeks <michael.meeks at novell.com> wrote:
>        Licensing wise - I'd like to add the standard LGPLv3+/MPL header to it
> (see bootstrap/) but having MIT too is fine if you want.

This patch adds the (c) header from the template to the idxdict.cpp
although i had to tweak it to 2011.

> > I have no idea how this would be integrated into the build process as I'm
> > not even sure where it is called from, but happy if someone wants to
> > take up the challenge and/or incorporate it as an installer process.
>        So - the installer process is more exciting on Windows I think - we'll
> need to see how the setup_native/ tools are called and be inspired by
> that I think.

I think in order to do any work on the windows installer I would have
to work out how to get a windows compile environment setup.
I currently only have it setup on my Ubunto machine.

> > The same set of files using th_gen_idx.pl took around 5 seconds (although
> > some basic fixups got it done to 3.5 seconds).
>        Great - its trivial; indeed - it rather makes you wonder whether we
> need the indexes at all ? [ I wonder what they are good for, and/or what
> code loads and uses them ;-]. We may discover that in fact there is no
> need for them to be indexed - any chance of a dig around ?

I imagine my timings are a bit skewed by the machine I tested on, and
the number of times I ran it.  I'm sure all the dictionaries were well
and truly in buffer cache so there was no I/O for the test.

On slower machines (are you targetting these) or slower disks there is
a chance the index files may offer a performance improvement.

Here is the same test after I dropped all my buffer cache:
real    0m2.300s
user    0m0.700s
sys     0m0.150s

> > These range from having the entry count incorrect, causing the index
> > process to miss a word (lots of these in some dictionaries), to having
> > words apparently duplicated either as the next entry, or sometimes a long
> > way apart.
>        That is bad; we should mail the l10n list to ask them to have a look I
> suppose.

I wasn't aware there was such a list and I can't find one on
freedesktop.org - is it a libreoffice related l10n list, or are these
dictionaries sourced from another project?

> > I have not attempted to fix these dictionary issues, but if they are
> > serious it might be worth having a perl script that is able to validate
> > the dictionaries are internally consistent.  Unfortunately, it would have
> > to use heuristics as the file format makes it difficult to tell in general
> > what kind of line is being processed.
>        Right; we should validate them as we compile the index perhaps - or at
> least, look at the parser and see how it has traditionally interpreted
> them.

If a utility were written that can validate the files, would it be
possible to make it reject on commit if it detected errors?

> > Having multiple entries for a word when loaded into libreoffice?

>        The native code thing is great; it'd be wonderful if you had some time
> to look at hooking it into the build process in dictionaries/ (?)

Yep... I will have to try to figure out how the build works though.
Back to the wiki, at least I've realised how to make git work across
the multiple checkouts now.
Steven Butler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copyright.patch
Type: application/octet-stream
Size: 1614 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110128/0347eb88/attachment.obj>

More information about the LibreOffice mailing list