[Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

Steve Butler sebutler at gmail.com
Sat Jan 29 03:45:21 PST 2011


Hi Michael,

>
>        Then 'mythes' seems to be used in lingucomponent/ somewhere - I suppose
> that is where to be digging for the user code. I suspect if we can read
> and index this file in two seconds - and it is used in response to user
> input - there may not really be a lot of value in indexing it ahead of
> time, but ... ;-) worth playing with that.

I haven't had a look at this yet as I thought getting a script to
analyze the existing thesaurus files would be helpful to get those
errors looked at.

I thought I would discuss your idea about not using the index at all
to see what reception it gets, but I think you may also have been
suggesting a similar thing:
are the index files even useful on modern gear?

I can populate the en_US index in memory from the .dat file with the
C++ code in 0.287 s after dropping all cache, and 0.188s when the
cache is hot.

I do admit that my desktop is pretty quick though, with 4 cores, SATA
II drives etc.

If the thesaurus is only loaded when the user pops it up, then
couldn't mythes be taught to generate its own in-memory index
from the dictionary and not bother with an index file at all?

BTW, if I did that I'd probably do some major surgery on mythes and
just use STL because it basically is doing C style memory management
and processing and I think I would screw it up if I started messing
with it.  The only problem with simplifying it with STL constructs is
that I would want to change the interface (string vs char *), maybe
use STL vectors for the list of synonyms, etc.

By this stage it's not looking much like mythes anymore ...

Regards
Steven Butler


More information about the LibreOffice mailing list