<html>
<head>
<base href="https://bugs.documentfoundation.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:serval2412@yahoo.fr" title="Julien Nabet <serval2412@yahoo.fr>"> <span class="fn">Julien Nabet</span></a>
</span> changed
<a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED INVALID - Data in Visual FoxPro DBF is garbled"
href="https://bugs.documentfoundation.org/show_bug.cgi?id=69744">bug 69744</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">CC</td>
<td>
</td>
<td>serval2412@yahoo.fr
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED INVALID - Data in Visual FoxPro DBF is garbled"
href="https://bugs.documentfoundation.org/show_bug.cgi?id=69744#c11">Comment # 11</a>
on <a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED INVALID - Data in Visual FoxPro DBF is garbled"
href="https://bugs.documentfoundation.org/show_bug.cgi?id=69744">bug 69744</a>
from <span class="vcard"><a class="email" href="mailto:serval2412@yahoo.fr" title="Julien Nabet <serval2412@yahoo.fr>"> <span class="fn">Julien Nabet</span></a>
</span></b>
<pre>Following recent dBase commits (see
<a href="https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=dbase">https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=dbase</a>), the dbf
files open with RTL_TEXTENCODING_IBM_866 (Russian MS-DOS code page 866)
hexdump of the file shows this:
0000000 0d30 1809 0001 0000 0148 0051 0000 0000
0000010 0000 0000 0000 0000 0000 0000 6500 0000
0000020 808d 8287 8d80 8588 0000 4300 0001 0000
0000030 0050 0004 0000 0000 0000 0000 0000 0000
0000040 000d 0000 0000 0000 0000 0000 0000 0000
0000050 0000 0000 0000 0000 0000 0000 0000 0000
*
0000140 0000 0000 0000 0000 d020 f1f3 eaf1 e9e8
0000150 f220 eae5 f2f1 2020 2020 2020 2020 2020
0000160 2020 2020 2020 2020 2020 2020 2020 2020
*
0000190 2020 2020 2020 2020 1a20
000019a
Let's read it in little-endian way, so first byte is 30 not 0d.
30 is version and corresponds here to VisualFoxPro file (see
<a href="http://opengrok.libreoffice.org/xref/core/connectivity/source/inc/dbase/DTable.hxx#40">http://opengrok.libreoffice.org/xref/core/connectivity/source/inc/dbase/DTable.hxx#40</a>)
65 (in second line) indicates RTL_TEXTENCODING_IBM_866
Third line gives field name, its fieldtype and 50 from beginning "50" from line
gives indicates length field (80 in decimal).
But then lines 7 and 8 give content of the record but nothing about encoding.
So I don't know how LO could "guess" the encoding of the context except by
testing range value of charsets, eg:
d0 in <a href="https://www.ascii-codes.com/cp866.html">https://www.ascii-codes.com/cp866.html</a> gives "Box drawings up double and
horizontal single"
d0 in <a href="http://www.iana.org/assignments/charset-reg/PTCP154">http://www.iana.org/assignments/charset-reg/PTCP154</a> gives "CYRILLIC
CAPITAL LETTER ER"
But even with this, a user could want some non cyrillic characters (bow
drawings) in content and the guessing would be wrong.
BTW, would be interested in dbf original with different versions (DB2, DB3,
DB4... with memo, with sql, ...FoxPro, etc.) and encodings.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>