[Libreoffice-bugs] [Bug 117324] Hungarian dictionary contains invalid UTF-8 sequences

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Sat Apr 28 21:51:29 UTC 2018


https://bugs.documentfoundation.org/show_bug.cgi?id=117324

László Németh <nemeth at numbertext.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |nemeth at numbertext.org
         Resolution|---                         |INVALID

--- Comment #1 from László Németh <nemeth at numbertext.org> ---
hu_HU.dic and hu_HU.aff file are not UTF-8 encoded files.

They contain UTF-8 encoded dictionary items (words and morphemes), and the
default 8-bit flags, see hunspell (5) manual page for dictionary format.

The suggested conversion duplicates the memory footprint of the flag vectors,
and  decoding of the UTF-8 encoded flags slows down the dictionary loading by
70% (plain dic.) or 50% (alias compressed dic.), resulting noticeable
differences in the user interface of LibreOffice.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20180428/b220cc88/attachment.html>


More information about the Libreoffice-bugs mailing list