[Libreoffice] RC4 / Windows size analysis ...

Fri Jan 28 02:29:26 PST 2011

Hi Michael,

On 28 January 2011 04:04, Michael Meeks <michael.meeks at novell.com> wrote:
>
>
>        Licensing wise - I'd like to add the standard LGPLv3+/MPL header to it
> (see bootstrap/) but having MIT too is fine if you want.

This patch adds the (c) header from the template to the idxdict.cpp
although i had to tweak it to 2011.

> > I have no idea how this would be integrated into the build process as I'm
> > not even sure where it is called from, but happy if someone wants to
> > take up the challenge and/or incorporate it as an installer process.
>
>        So - the installer process is more exciting on Windows I think - we'll
> need to see how the setup_native/ tools are called and be inspired by
> that I think.

I think in order to do any work on the windows installer I would have
to work out how to get a windows compile environment setup.
I currently only have it setup on my Ubunto machine.

> > The same set of files using th_gen_idx.pl took around 5 seconds (although
> > some basic fixups got it done to 3.5 seconds).
>
>        Great - its trivial; indeed - it rather makes you wonder whether we
> need the indexes at all ? [ I wonder what they are good for, and/or what
> code loads and uses them ;-]. We may discover that in fact there is no
> need for them to be indexed - any chance of a dig around ?

I imagine my timings are a bit skewed by the machine I tested on, and
the number of times I ran it.  I'm sure all the dictionaries were well
and truly in buffer cache so there was no I/O for the test.

On slower machines (are you targetting these) or slower disks there is
a chance the index files may offer a performance improvement.

Here is the same test after I dropped all my buffer cache:
real    0m2.300s
user    0m0.700s
sys     0m0.150s

> > These range from having the entry count incorrect, causing the index
> > process to miss a word (lots of these in some dictionaries), to having
> > words apparently duplicated either as the next entry, or sometimes a long
> > way apart.
>
>        That is bad; we should mail the l10n list to ask them to have a look I
> suppose.

I wasn't aware there was such a list and I can't find one on
freedesktop.org - is it a libreoffice related l10n list, or are these
dictionaries sourced from another project?

> > I have not attempted to fix these dictionary issues, but if they are
> > serious it might be worth having a perl script that is able to validate
> > the dictionaries are internally consistent.  Unfortunately, it would have
> > to use heuristics as the file format makes it difficult to tell in general
> > what kind of line is being processed.
>
>        Right; we should validate them as we compile the index perhaps - or at
> least, look at the parser and see how it has traditionally interpreted
> them.

If a utility were written that can validate the files, would it be
possible to make it reject on commit if it detected errors?

> > Having multiple entries for a word when loaded into libreoffice?

>        The native code thing is great; it'd be wonderful if you had some time
> to look at hooking it into the build process in dictionaries/ (?)

Yep... I will have to try to figure out how the build works though.
Back to the wiki, at least I've realised how to make git work across
the multiple checkouts now.
--
Regards,
Steven Butler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copyright.patch
Type: application/octet-stream
Size: 1614 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110128/0347eb88/attachment.obj>