<html> <head> <base href="https://bugs.documentfoundation.org/"> </head> <body><table border="1" cellspacing="0" cellpadding="8"> <tr> <th>Bug ID</th> <td><a class="bz_bug_link bz_status_UNCONFIRMED " title="UNCONFIRMED - spell checking should normalize data first" href="https://bugs.documentfoundation.org/show_bug.cgi?id=107769">107769</a> </td> </tr> <tr> <th>Summary</th> <td>spell checking should normalize data first </td> </tr> <tr> <th>Product</th> <td>LibreOffice </td> </tr> <tr> <th>Version</th> <td>5.4.0.0.alpha1+ Master </td> </tr> <tr> <th>Hardware</th> <td>All </td> </tr> <tr> <th>OS</th> <td>All </td> </tr> <tr> <th>Status</th> <td>UNCONFIRMED </td> </tr> <tr> <th>Severity</th> <td>normal </td> </tr> <tr> <th>Priority</th> <td>medium </td> </tr> <tr> <th>Component</th> <td>Linguistic </td> </tr> <tr> <th>Assignee</th> <td>libreoffice-bugs@lists.freedesktop.org </td> </tr> <tr> <th>Reporter</th> <td>martin_hosken@sil.org </td> </tr></table> <p> <div> <pre>Words to be spell checked should be converted to NFKC first so that spell checking dictionaries don't need to hold all forms (NFD, NFC, mixed) of a word. I'm going to sketch my thoughts on how to do it here in case I can't get back to the bug for a while. Anyone want to take it further? In SpellChecker::GetSpellFailure in lingucomponent/source/spell/sspellimpl.cxx, rather than doing a poor man's hand created NFK into nWord, start with an nWord created something like: icu::UnicodeString rIn(reinterpret_case<const UChar *>(rWord.getStr()), rWord.getLength()); icu::UnicodeString normal; UErrorCode rCode; icu::Normalizer(rIn, UNORM_NFKC, normal, rCode); OUString nWord(U_SUCCESS(rCode) ? OUString(reinterpret_case<Sal_Unicode *>(normal.getBuffer()), normal.length()) : OUString()); then use nWord instead of rWord for the rest of the function. Need to find a test for this.</pre> </div> </p> <hr> <span>You are receiving this mail because:</span> <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>