<html>
    <head>
      <base href="https://bugs.documentfoundation.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_UNCONFIRMED "
   title="UNCONFIRMED - Hungarian dictionary contains invalid UTF-8 sequences"
   href="https://bugs.documentfoundation.org/show_bug.cgi?id=117324">117324</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Hungarian dictionary contains invalid UTF-8 sequences
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>LibreOffice
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>6.1.0.0.alpha1+ Master
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>UNCONFIRMED
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Linguistic
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>libreoffice-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>pander@users.sourceforge.net
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Description:
The Hungarian dictionary contains invalid UTF-8 sequences and cannot be used or
converted. For exact details, see
<a href="https://github.com/hunspell/hunspell/issues/559">https://github.com/hunspell/hunspell/issues/559</a>

Steps to Reproduce:
Open hu_HU_u8.aff in gedit

sudo apt install hunspell-hu
gedit /usr/share/hunspell/hu_HU.aff --encoding=UTF-8


Actual Results:  
Bugged behavior (output)

Gedit shows error. If by any chance it tries to interpret the file as
ISO-8859-15 open the file with --encoding option in gedit.

Expected Results:
Expected behavior (output)

No error should be shown by the text editor. Valid UTF-8 is expected.


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Solution

Invalid UTF appears only in comments and in flag vectors.

Upstream is here <a href="https://sourceforge.net/projects/magyarispell/">https://sourceforge.net/projects/magyarispell/</a> , open the
source tarball.

The fix is in the file bin/u8myspell. The following script should fix it
completely.

#!/bin/bash
set -x
export LANG=en_US
export LC_ALL=C

case $# in
0|1|2) echo "u8myspell - converts MySpell dictionaries to UTF-8
usage: u8myspell source_name output_name source_charset"; exit 1;;
esac

i=$1
o=$2
charset=$3
localdir="$(dirname $0)"

iconv -f "$charset" -t UTF-8 "$i.dic" | sed -f "$localdir"/l1_u8.sed > "$o.dic"
iconv -f "$charset" -t UTF-8 "$i.aff" |
sed 's/^SET .*$/SET UTF-8\
FLAG UTF-8/' | sed -f "$localdir"/l1_u8.sed > "$o.aff"

Basically the latin2 is converted to utf8 and the command FLAG UTF-8 is
additionally issued in .aff.


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101
Firefox/59.0</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>