[Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

Steve Butler sebutler at gmail.com
Sat Jan 29 03:45:21 PST 2011

Hi Michael,

>        Then 'mythes' seems to be used in lingucomponent/ somewhere - I suppose
> that is where to be digging for the user code. I suspect if we can read
> and index this file in two seconds - and it is used in response to user
> input - there may not really be a lot of value in indexing it ahead of
> time, but ... ;-) worth playing with that.

I haven't had a look at this yet as I thought getting a script to
analyze the existing thesaurus files would be helpful to get those
errors looked at.

I thought I would discuss your idea about not using the index at all
to see what reception it gets, but I think you may also have been
suggesting a similar thing:
are the index files even useful on modern gear?

I can populate the en_US index in memory from the .dat file with the
C++ code in 0.287 s after dropping all cache, and 0.188s when the
cache is hot.

I do admit that my desktop is pretty quick though, with 4 cores, SATA
II drives etc.

If the thesaurus is only loaded when the user pops it up, then
couldn't mythes be taught to generate its own in-memory index
from the dictionary and not bother with an index file at all?

BTW, if I did that I'd probably do some major surgery on mythes and
just use STL because it basically is doing C style memory management
and processing and I think I would screw it up if I started messing
with it.  The only problem with simplifying it with STL constructs is
that I would want to change the interface (string vs char *), maybe
use STL vectors for the list of synonyms, etc.

By this stage it's not looking much like mythes anymore ...

Steven Butler

More information about the LibreOffice mailing list