[Libreoffice-bugs] [Bug 131487] Words whose characters span multiple languages should not undergo spell checking

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Fri Jun 26 08:48:32 UTC 2020


https://bugs.documentfoundation.org/show_bug.cgi?id=131487

--- Comment #8 from sergio.callegari at gmail.com ---
@Mihkel Tõnnov

> you said that "quell'" alone is not an Italian word - how is it currently handled by spellcheck, if used before an Italian word? Would 2a as described above work for Italian?

In Italian there are a lot of cases where there is an "elision" between two
similar sounds. For instance we have the article "lo" that loses the final o
when preceeding nouns that start with a vowel. For instance, rather than
writing "lo ombrello" you write "l’ombrello". Incidentally, this is the same
thing that happens with "quello" and "albero" that become "quell’albero" in my
previous example. In this latter case "quello" is not an article, but the rule
is the same.

To the best of my understanding these cases are treated by considering the two
words that come to be pronounced as a single one because of the elision as a
single word for spell checking.

Hence in the spell checking dictionary you have "lo" "ombrello" but also
"l’ombrello", "quello", "albero", but also "quell'albero". I do not know the
details, but I think that this is handled efficiently in the spell checker by
combining a base dictionary with an affix file setting some rules to extend the
base dictionary. In any case, this saves you from having to introduce the
elided forms like "quell" in the dictionary, since these are not correct words
by themselves.

This is why I think that it would be incorrect to consider the "’" as a word
separator, at least in Italian and why I think that 2a would not be OK.

To me, the simplest thing to do would be keeping the word separator exactly as
it is. Then before passing a word to the spell checker, if you have a word
where different characters belong to different languages, pretend that the
language for the whole word is "none", rather than pretending it is the
language of the first character.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20200626/310a3b94/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list