[Libreoffice-bugs] [Bug 138502] New: Spellchecker problems with multiple languages and custom languages

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Thu Nov 26 04:04:47 UTC 2020


https://bugs.documentfoundation.org/show_bug.cgi?id=138502

            Bug ID: 138502
           Summary: Spellchecker problems with multiple languages and
                    custom languages
           Product: LibreOffice
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: medium
         Component: Linguistic
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: ariel18 at trashmail.com
                CC: sophi at libreoffice.org

Description:
Suppose I am writing an English text containing many German words, or a German
text containing many English ones. There is no obvious way to use a
spellchecker without manually labeling each language switch.

Suppose I write my own .aff file for a language. It seems I can set a
user-defined dictionary, but I cannot find any way to set a user-defined .aff
file. This makes it difficult to develop and test new dictionaries in
LibreOffice.

Steps to Reproduce:
1. Write a document containing words in two or more languages (where one
language may or may not be supported)
2. Try to use spellcheck


Actual Results:
I may want the spellchecker to accept any words in the de_DE.dic files,
inflected according to the de_DE.aff file, AND any words in the en_GB.dic file,
inflected according to the en_GB.aff file. However, currently, I cannot, unless
I explicitly tell the document which bits are in which language
(Tools>Language>For selection), which is more tedious than manual spellchecking
in each of the two language modes. 

I realize that using multiple languages would increase my false-negative rate,
since misspellings that happened to be words in the other language would not be
picked up. That's acceptable; it'd be much better than the huge false-positive
rate you get when spellchecking German as English or vice-versa.

To avoid this error-rate issue, I could add the minor-language words to a
user-defined dictionary; this often works well.

However, in user-defined dictionaries, I can only give inflection rules by
analogy to existing words *in the default document language*. Many words in
German inflect in ways that English words do not, and vice-versa. I therefore
have to add every possible inflection of each second-language word as a
separate user-defined dictionary entry, or the spellchecker won't work. This is
very tedious.

Expected Results:
Potential solutions:
Potential solution 1: I could manually amalgamate the de_DE and en_GB files,
but that would be tedious (inflection categories have what are essentially
one-capital-letter variable names!). Also, while there's a system for adding
user-defined dictionaries, there is no way I can see to add a user-defined .aff
file. So it seems I'd have to pretend my hybrid file was an existing language!
And I get the error-rate issue. This solution seems poor.

Potential solution 2: Since many extensions supply new dictionary+aff-file
pairs, an extension/function allowing the user to add custom pairs seems like
it should be possible, but I don't think it exists.
https://wiki.documentfoundation.org/Development/Dictionaries
has no instructions, beyond asking the developers.

Potential solution 2.5: An option to do a semi-automated merge of existing
dictionary+aff pairs to create a custom merged dic+aff pair for use as in PS2
above (while leaving the original languages intact). The error-rate issue
occurs, unless I pare down the auto-generated file. 

Potential solution 3: User-defined dictionaries currently only allow users to
define inflections by analogy to words in ONE dictionary+aff pair: "inflect
this word like the word 'troggle' in the .dic file". There is no way to say
"inflect the word "triggle" like the word 'troggle' in (xy_XY.dic and
xy_XY.aff), and inflect the word "boing" like the word 'sproing' in (wz_WZ.dic
and wz_WZ.aff)". 
        3a. I'd like to have the option to define inflections with variable
names (like in the non-user-defined dictionaries, e.g.: "Adam/SM", where "S"
and "M" are classes of inflections "Adam" takes, namely "Adams" and "Adam's"). 
        3b. I'd also like to use variable names that refer to a specific .aff
file. Example: if the word is "widget", defining inflections with
"widget/$en_GB_X" instead of "widget/X". It should also be possible to say
"widget/$en_GB_X+$de_DE_Y" or "widget/$en_GB_X$de_DE_X". But maybe multiletter
names would run into format-definition problems. This solution would greatly
reduce the error-rate issue, especially combined with PS2. It would save
manually copying inflections into a PS2 custom .aff file.

Potential solution 4: Add a Libreoffice setting to tell the spellchecker to use
multiple pairs of words+inflections, and only flag words not found in any
selected language. For correct-spelling-guessing algorithms, I'd be happy to
set a preference for the rules in one specified .aff file over another, or set
an order of priorities. Error-rate issue occurs, but that may be acceptable to
many users.



Reproducible: Always


User Profile Reset: No



Additional Info:
PS4 would probably be simplest for the majority of users, and useful for
language teaching and people writing about A-language texts in language B. PS2
would be the most flexible, and useful for people using rare languages. It
would encourage users to develop language tools for LibreOffice. Conlanggers
would love it, too.

PS3 has an additional use case. I may want to accept words from the de_DE.dic
which have been inflected according to rules in the en_GB.aff file, or
vice-versa; for example, "The Bundestag's procedural rules forbid it" or "Ich
habe den Computer gecrasht" ("crash" is not really a German word, and
"Computer", as a German noun, is capitalized). With PS3 I can easily add these
cases to my user-defined dictionary.

It seems I am not the only one who would appreciate this sort of functionality,
which makes me fear that at least PS4 is a difficult feature to add:
https://ask.libreoffice.org/en/question/71151/simultaneously-use-two-languages-in-one-document/

PS2 would really be useful (even without adding PS2.5 and PS3, which would make
it more useful), and it looks, to my ignorant eyes, easier to implement.

Questions:
Do any of these potential solutions already exist, and if so, where can I learn
about them? If not, how feasible are PS2-4 as feature requests?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20201126/be13a453/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list