spell checker issue

Németh László nemeth.lacko at gmail.com
Fri Oct 26 04:16:09 PDT 2012


Hi,

2012/10/25 Caolán McNamara <caolanm at redhat.com>:
> On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote:
>> Hi,
>>
>> Adding a simple new item to the en_US.dic, like
>>
>> men's
>>
>> will extend the dictionary. The biggest plus in the American English
>> dictionary of LibreOffice is the morphological data (also based on
>> Kevin's data and maybe WordNet) for stemming and morphological
>> generation in thesaurus suggestions, see the attached conversion
>> script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.
>
> So basically one attractive route to go would be to build our dictionary
> at LibreOffice build time ourselves from wordnet +
> custom-libreoffice-words patch + that script. Which would give us
> something we can easily sync whenever wordnet gets updated without
> losing the extra morphological data. Or is there any gotchas with doing
> that ?

Only a small part of Wordnet – the list of the irregular forms – used
by the script. But the thesaurus of LibreOffice is based on the full
Wordnet, so it would be fine to add the thesaurus generation to the
building process. We would be able to add some attractive thesaurus
improvements, too, like Unicode symbols as synonyms: eg. alpha -> α,
skull -> ☠, as in the Hungarian thesaurus.

Gotchas: there were some manual fixes (documented in the
README_en_US.txt) to handle Unicode apostrophes and ligatures.
Adding a small list with the most urgent words would be easier for me.

I also tried to find an old OpenOffice.org issue about the quality
analysis/extension of the (American) English dictionary, but I have
found only the
en-GB-oed dictionary for international organizations, see
https://issues.apache.org/ooo/show_bug.cgi?id=51093,
http://ftp.nluug.nl/office/openoffice/contrib/dictionaries/README_en_GB-oed.txt.

Best regards,
László


>
> C.
>


More information about the LibreOffice mailing list