[Libreoffice] fix for twofold suffix stripping + compound in hunspell

Arno Teigseth arnotixe at gmail.com
Sun Apr 24 19:31:45 PDT 2011


I've just hacked the hunspell sources I got from CVS so that I could get
rid of this hunspell bug:

The issue was that hunspell would accept words with one suffix+compound,
but not two suffixes+compound. Patch attached in the above link.


Since most words in Quichua have stem+suffix_level1+suffix_level2
+lots_of_compounds, this was a showstopper for my spellchecker.

At first I wrote a generator script that would create all the different
inflexes and stick it into a dictionary, but it was a bad hack and very
slow with a dictionary file of 170000+entries for only some 3000

But now also analysis benefits :D

hunspell -d qu_EC -m   
mikunakuykunatakapash  pa:mikunakuy st:mikuna # base plur #
Nounification "amongst us" pa:kuna st:kuna pa:ta st:ta pa:ka st:ka

[For the curious the word means something like "for to the big feasts,

The patch is just two lines, one for each of two functions in
affixmgr.cxx, compound_check() and compound_check_morph().

For LO only the one in AffMgr::compound_check() is needed, I think,
haven't seen the morphology analysis being used anywhere in LO...

Arno Teigseth
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110424/a3bc1065/attachment.pgp>

More information about the LibreOffice mailing list