libexttextcat data garbled in Hungarian

Mark Robson markxr at gmail.com
Fri Oct 25 13:10:58 CEST 2013


Hi,

The data files for libexttextcat in this directory:

https://github.com/giuliopaci/libexttextcat/tree/master/langclass/ShortTexts

Contains a garbled Hungarian version, it's almost in iso-8859-1 but some
characters are destroyed because it doesn't contain all Hungarian
characters.

It is easy to pick up a utf-8 good version from

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hng

and see the difference.

It's not clear whether this prevents it from classifying Hungarian text
correctly, but it may stop it working in utf-8, because most of the other
files are in utf-8.

Cheers

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20131025/d256ca53/attachment.html>


More information about the LibreOffice mailing list