extended attribute standardization

Sun Nov 19 15:15:08 EET 2006

2006/11/19, Michael Burschik <Michael.Burschik at gmx.de>:
> > For example OpenDocument supports Dublin core, and applications
> > writing OpenDocument could set Dublin core attributes as xattrs on the
> > file when saving as a service for indexers etc, but since they are
> > already persisted in the file, it would be somewhat redundant.
> But in order to extract that information, you need to be able to read
> the document format. Since there are too many document formats to
> number, there would be considerable value in duplicating the information
> in a readily readable format, such as extended attributes.

Storing information that has already been stored in a file again in
the extended attributes is problematic for two reasons: 1) it takes
twice the space 2) the information can be out of sync.
It would be much more sensible to have a standard api for getting the
information out of the file. A good api example for a simple program
that does this is the program 'xmlindexer' that comes with Strigi. It
takes a file or directory as argument and extracts all information in
universal xml format independent of the type of file.

A few weeks back, I submitted a proposal to make a testset for
unifying and validating the output of applications that extract
metadata from files. The proposal included code for performing the
validation. It requires that the output of the validation programs has
the same format. I chose xml because it is easy to parse and it is
easy for existing programs to add xml as an output format. Currently,
i use this xml output for unit testing the extraction of data from
files and it is very valuable in protecting against regressions.

Cheers,
Jos