[Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

Steven Butler sebutler at gmail.com
Mon Jan 31 12:45:11 PST 2011


On 1 February 2011 06:30, Caolán McNamara <caolanm at redhat.com> wrote:
> FWIW, I'm sure Nemeth would be interested if you e.g. wanted to create a
> reimpl of mythes that was faster than the original and perhaps simply
> designate the optimized version the new "mythes" version with an API/ABI
> change :-)

I don't think there is any need for an API or ABI change as I'm shying
away from an STL reimplementation.  If optimisation is desired
(probably not needed), reducing the string allocations by reading in
the whole index file certainly helps (I cut down from 0.046 seconds
with hot-cache to 0.019 seconds with hot cache to load the US
dictionary.  The speedup is similar on cold cache but I can't recall
the numbers exactly - something like 0.1 seconds down to 0.05 seconds.

I thought it would be possible to use the STL algorithms to do the
binary search and/or use the map, but using all those strings and a
map take considerably longer than all the strdups in the original (I
recall about 0.08 seconds to load the index using STL map.  I didn't
measure lookup time but it would be very similar.

Using STL vectors made it comparable, but then it turns out
binary_search only tells you if an item exists, not its index which is
kind of annoying. :)

So at this point I think an STL rewrite would not result in a
performance improvement, so would be an academic exercise.

Regards,
Steven Butler


More information about the LibreOffice mailing list