A proposal for standardizing TSV files

Piotr Mitros piotr at mitros.org
Fri Nov 4 21:13:08 UTC 2016


Thank you. I read the W3C recommendation, as well as the referenced
documents. I drafted a comparison here:
  https://github.com/pmitros/tsvx/blob/master/doc_source/related_formats.md

I think the standards are trying to do something a bit different, and are
actually pretty complementary. tsvx is designed to facilitate compatibility
between applications for internal data analysis and BI work. It is a
prescriptive standard. It says how files ought to be escaped and formatted.
The W3C CSV for the Web group appears to be doing exactly what name implies
-- provide descriptive metadata for public redistribution of datasets on
the web, especially for use on the semantic web. It is a descriptive
standard designed to work with all essentially all tabular data files. A
tsvx file could certainly be described with the W3C metadata if the
intention were external distribution.

Just to give the types of use cases I have internally:

   - I have pipelines where I might have a dozen TSV files generated by
   scripts working on data from MySQL, Vertica, and spreadsheets, all feeding
   back to create reports. Before I switched to tsvx, scripts were brittle to
   fairly modest format changes (e.g. adding a column), and had a bunch of
   unnecessary logic parsing data types.
   - Each time I import something into a tool I didn't create, I need to
   click through a dialog letting it know what the delimiter is, and in
   LibreOffice, reformat column types.

Adding W3C metadata files would add overhead for this type of work, rather
than reducing it, and would only provide benefit at the stage of the final
results.

Piotr

On Thu, Nov 3, 2016 at 10:35 AM, Eike Rathke <erack at redhat.com> wrote:

> Hi Piotr,
>
> On Thursday, 2016-11-03 08:08:23 -0400, Piotr Mitros wrote:
>
> > I do a fair bit of work where I move data between LibreOffice, MySQL,
> > Vertica, Google Docs, Hadoop, Python, and a few other systems. The
> > formatting of TSV files is ad-hoc. Each system has little differences in
> > how strings are escaped, and similar. In addition, there is no way to
> > preserve metadata.
> >
> > I drafted a modest proposed spec for standardizing TSV files by
> > standardizing types, and adding metadata, and was hoping to solicit
> > feedback on that proposal:
> >
> > http://www.tsvx.org/
>
> It seems to me you're attempting to reinvent a wheel. I suggest you take
> a look at https://www.w3.org/standards/techs/csv and maybe
> https://www.w3.org/community/csvw/
>
>   Eike
>
> --
> LibreOffice Calc developer. Number formatter stricken i18n
> transpositionizer.
> GPG key "ID" 0x65632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563
> 2D3A
> Better use 64-bit 0x6A6CD5B765632D3A here is why: https://evil32.com/
> Care about Free Software, support the FSFE https://fsfe.org/support/?erack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20161104/94d6ef1d/attachment.html>


More information about the LibreOffice mailing list