libexttextcat data garbled in Hungarian
Mark Robson
markxr at gmail.com
Fri Oct 25 13:10:58 CEST 2013
Hi,
The data files for libexttextcat in this directory:
https://github.com/giuliopaci/libexttextcat/tree/master/langclass/ShortTexts
Contains a garbled Hungarian version, it's almost in iso-8859-1 but some
characters are destroyed because it doesn't contain all Hungarian
characters.
It is easy to pick up a utf-8 good version from
http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hng
and see the difference.
It's not clear whether this prevents it from classifying Hungarian text
correctly, but it may stop it working in utf-8, because most of the other
files are in utf-8.
Cheers
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20131025/d256ca53/attachment.html>
More information about the LibreOffice
mailing list