[Libreoffice-bugs] [Bug 117324] New: Hungarian dictionary contains invalid UTF-8 sequences
bugzilla-daemon at bugs.documentfoundation.org
bugzilla-daemon at bugs.documentfoundation.org
Sat Apr 28 21:12:07 UTC 2018
https://bugs.documentfoundation.org/show_bug.cgi?id=117324
Bug ID: 117324
Summary: Hungarian dictionary contains invalid UTF-8 sequences
Product: LibreOffice
Version: 6.1.0.0.alpha1+ Master
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: Linguistic
Assignee: libreoffice-bugs at lists.freedesktop.org
Reporter: pander at users.sourceforge.net
Description:
The Hungarian dictionary contains invalid UTF-8 sequences and cannot be used or
converted. For exact details, see
https://github.com/hunspell/hunspell/issues/559
Steps to Reproduce:
Open hu_HU_u8.aff in gedit
sudo apt install hunspell-hu
gedit /usr/share/hunspell/hu_HU.aff --encoding=UTF-8
Actual Results:
Bugged behavior (output)
Gedit shows error. If by any chance it tries to interpret the file as
ISO-8859-15 open the file with --encoding option in gedit.
Expected Results:
Expected behavior (output)
No error should be shown by the text editor. Valid UTF-8 is expected.
Reproducible: Always
User Profile Reset: Yes
Additional Info:
Solution
Invalid UTF appears only in comments and in flag vectors.
Upstream is here https://sourceforge.net/projects/magyarispell/ , open the
source tarball.
The fix is in the file bin/u8myspell. The following script should fix it
completely.
#!/bin/bash
set -x
export LANG=en_US
export LC_ALL=C
case $# in
0|1|2) echo "u8myspell - converts MySpell dictionaries to UTF-8
usage: u8myspell source_name output_name source_charset"; exit 1;;
esac
i=$1
o=$2
charset=$3
localdir="$(dirname $0)"
iconv -f "$charset" -t UTF-8 "$i.dic" | sed -f "$localdir"/l1_u8.sed > "$o.dic"
iconv -f "$charset" -t UTF-8 "$i.aff" |
sed 's/^SET .*$/SET UTF-8\
FLAG UTF-8/' | sed -f "$localdir"/l1_u8.sed > "$o.aff"
Basically the latin2 is converted to utf8 and the command FLAG UTF-8 is
additionally issued in .aff.
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101
Firefox/59.0
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20180428/0f4e93a8/attachment.html>
More information about the Libreoffice-bugs
mailing list