[Libreoffice-bugs] [Bug 116666] New: Fix Hungarian sorting

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Tue Mar 27 19:19:06 UTC 2018


https://bugs.documentfoundation.org/show_bug.cgi?id=116666

            Bug ID: 116666
           Summary: Fix Hungarian sorting
           Product: LibreOffice
           Version: Inherited From OOo
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Localization
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: nemeth at numbertext.org

Hungarian orthography rules contain the following extra requirements for
sorting words and sentences:

– expand simplified double consonants;

– ignore spaces and hyphens;

– prefer lower case homonyms.

(Source: http://helyesírás.mta.hu/helyesiras/default/akh12#F2_4)

Expansion of double consonants, (eg. sort “ccs” (long “cs”) as “cscs”) is still
not perfect, but in my analysis, it reduces the bad sorting positions by a
factor of 1/5, than ordering without explansion (3843 vs. 19425 in 4 million
word forms).

More important advantage, using full expansion it's possible to automatize
Hungarian sorting with manual (or in future, Hunspell based) preprocessing.
(Unfortunatelly, ICU collation algorithm alone is not enough for Hungarian,
yet.) Inserting soft hyphens is a quick workaround for here, too (as for the
similar problem of the single consonants, eg. “igazság” -> igaz­ság
(igaz[U+AD]ság) sorted before “igaztalan” correctly).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20180327/95bdcf99/attachment.html>


More information about the Libreoffice-bugs mailing list