[Libreoffice] RC4 / Windows size analysis ...
Steve Butler
sebutler at gmail.com
Fri Jan 28 02:29:26 PST 2011
Hi Michael,
On 28 January 2011 04:04, Michael Meeks <michael.meeks at novell.com> wrote:
>
>
> Licensing wise - I'd like to add the standard LGPLv3+/MPL header to it
> (see bootstrap/) but having MIT too is fine if you want.
This patch adds the (c) header from the template to the idxdict.cpp
although i had to tweak it to 2011.
> > I have no idea how this would be integrated into the build process as I'm
> > not even sure where it is called from, but happy if someone wants to
> > take up the challenge and/or incorporate it as an installer process.
>
> So - the installer process is more exciting on Windows I think - we'll
> need to see how the setup_native/ tools are called and be inspired by
> that I think.
I think in order to do any work on the windows installer I would have
to work out how to get a windows compile environment setup.
I currently only have it setup on my Ubunto machine.
> > The same set of files using th_gen_idx.pl took around 5 seconds (although
> > some basic fixups got it done to 3.5 seconds).
>
> Great - its trivial; indeed - it rather makes you wonder whether we
> need the indexes at all ? [ I wonder what they are good for, and/or what
> code loads and uses them ;-]. We may discover that in fact there is no
> need for them to be indexed - any chance of a dig around ?
I imagine my timings are a bit skewed by the machine I tested on, and
the number of times I ran it. I'm sure all the dictionaries were well
and truly in buffer cache so there was no I/O for the test.
On slower machines (are you targetting these) or slower disks there is
a chance the index files may offer a performance improvement.
Here is the same test after I dropped all my buffer cache:
real 0m2.300s
user 0m0.700s
sys 0m0.150s
> > These range from having the entry count incorrect, causing the index
> > process to miss a word (lots of these in some dictionaries), to having
> > words apparently duplicated either as the next entry, or sometimes a long
> > way apart.
>
> That is bad; we should mail the l10n list to ask them to have a look I
> suppose.
I wasn't aware there was such a list and I can't find one on
freedesktop.org - is it a libreoffice related l10n list, or are these
dictionaries sourced from another project?
> > I have not attempted to fix these dictionary issues, but if they are
> > serious it might be worth having a perl script that is able to validate
> > the dictionaries are internally consistent. Unfortunately, it would have
> > to use heuristics as the file format makes it difficult to tell in general
> > what kind of line is being processed.
>
> Right; we should validate them as we compile the index perhaps - or at
> least, look at the parser and see how it has traditionally interpreted
> them.
If a utility were written that can validate the files, would it be
possible to make it reject on commit if it detected errors?
> > Having multiple entries for a word when loaded into libreoffice?
> The native code thing is great; it'd be wonderful if you had some time
> to look at hooking it into the build process in dictionaries/ (?)
Yep... I will have to try to figure out how the build works though.
Back to the wiki, at least I've realised how to make git work across
the multiple checkouts now.
--
Regards,
Steven Butler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copyright.patch
Type: application/octet-stream
Size: 1614 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110128/0347eb88/attachment.obj>
More information about the LibreOffice
mailing list