A proposal for standardizing TSV files

Piotr Mitros piotr at mitros.org
Thu Nov 3 12:08:23 UTC 2016


Hello,

I do a fair bit of work where I move data between LibreOffice, MySQL,
Vertica, Google Docs, Hadoop, Python, and a few other systems. The
formatting of TSV files is ad-hoc. Each system has little differences in
how strings are escaped, and similar. In addition, there is no way to
preserve metadata.

I drafted a modest proposed spec for standardizing TSV files by
standardizing types, and adding metadata, and was hoping to solicit
feedback on that proposal:

http://www.tsvx.org/

I'm trying to maintain the parts of TSV which make it great -- simplicity,
human-readability, and rapid single-pass parsing, but add enough structure
to eliminate all the scripting that goes on when moving data between
systems, as well as to eliminate some of the brittleness (TSV files break
if a column is added, and one-pass parsing breaks if an unexpected type is
found 10GB down).

Since this touches closely on LibreOffice, and if it becomes standards,
it's something we'd all have to live with, I was hoping to solicit some
feedback on this from LibreOffice developers.

github issues (https://github.com/pmitros/tsvx/issues) are the preferred
way of communicating, but I'll monitor this thread, and personal email is
okay as well.

Piotr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20161103/1000c315/attachment.html>


More information about the LibreOffice mailing list