[Libreoffice-bugs] [Bug 69744] Data in Visual FoxPro DBF is garbled

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Sat Jun 17 16:18:37 UTC 2017


https://bugs.documentfoundation.org/show_bug.cgi?id=69744

Julien Nabet <serval2412 at yahoo.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |serval2412 at yahoo.fr

--- Comment #11 from Julien Nabet <serval2412 at yahoo.fr> ---
Following recent dBase commits (see
https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q=dbase), the dbf
files open with RTL_TEXTENCODING_IBM_866 (Russian MS-DOS code page 866)
hexdump of the file shows this:
0000000 0d30 1809 0001 0000 0148 0051 0000 0000
0000010 0000 0000 0000 0000 0000 0000 6500 0000
0000020 808d 8287 8d80 8588 0000 4300 0001 0000
0000030 0050 0004 0000 0000 0000 0000 0000 0000
0000040 000d 0000 0000 0000 0000 0000 0000 0000
0000050 0000 0000 0000 0000 0000 0000 0000 0000
*
0000140 0000 0000 0000 0000 d020 f1f3 eaf1 e9e8
0000150 f220 eae5 f2f1 2020 2020 2020 2020 2020
0000160 2020 2020 2020 2020 2020 2020 2020 2020
*
0000190 2020 2020 2020 2020 1a20               
000019a

Let's read it in little-endian way, so first byte is 30 not 0d.
30 is version and corresponds here to VisualFoxPro file (see
http://opengrok.libreoffice.org/xref/core/connectivity/source/inc/dbase/DTable.hxx#40)
65 (in second line) indicates RTL_TEXTENCODING_IBM_866
Third line gives field name, its fieldtype and 50 from beginning "50" from line
gives indicates length field (80 in decimal).
But then lines 7 and 8 give content of the record but nothing about encoding.

So I don't know how LO could "guess" the encoding of the context except by
testing range value of charsets, eg:
d0 in https://www.ascii-codes.com/cp866.html gives "Box drawings up double and
horizontal single"
d0 in http://www.iana.org/assignments/charset-reg/PTCP154 gives "CYRILLIC
CAPITAL LETTER ER"
But even with this, a user could want some non cyrillic characters (bow
drawings) in content and the guessing would be wrong.

BTW, would be interested in dbf original with different versions (DB2, DB3,
DB4... with memo, with sql, ...FoxPro, etc.) and encodings.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20170617/338efc04/attachment.html>


More information about the Libreoffice-bugs mailing list