Serialize extended attributes ?

Vegard Nossum vegard at peltkore.net
Mon Jul 16 03:21:25 PDT 2007


On Sat, July 14, 2007 3:07 pm, Mildred wrote:
> I would like to see the extended attributes more widely used,
> especially for the mime type of documents. The libmagic is really good
> but sometimes it can't really detect differences between files types
> that are really close. And extended attributes can be used to store
> others informations like the encoding (utf-8, iso-8859-1, utf-16, ...)
> of text files.

Yes, I agree.  There are many uses of xattrs which are not actually being
used.

> The problem is that "many applications and filesystems do not support
> extended attributes".
>
> This problem occurs also when you want to send a file to someone else
> or when you want to store a document with extended attributes on a
> filesystem where extended attributes are not available. So what about
> creating a special file that will serialize extended attributes. So the
> data found in these attributes would not be lost.

This is very possible. This automatically solves a lot of problems, also
for instance that of losing metadata when you download a file over HTTP.
(Just download both the file and its .metadata and nothing is lost; heck,
Apache could be modified to serve these files virtually even if the file
on the server has metadata stored using real fs xattrs).

In fact, when I think of it, a lot of people create .md5 or .sha1 checksum
files and store them on their servers along with the real files. This is a
kind of metadata similar to serialized xattrs, though a rather highly
specialized use.

It would be nice to have a format specification for the serialized xattrs.
However, this is not enough. There must also be a C library that
implements the specification. If not, the file will never be used
consistently. I think such a library should easily provide transparent
access to native filesystem xattrs and serialized metadata file with as
little user interaction as possible (ie. simple from user/developer point
of view).

> I had the idea that the extended attributes could be like mail or http
> headers. That is the name of the attribute followed by a colon ':' and
> the data. After all attributes there would be a blank line and the file
> encapsulated.
> The extended attributes can be also serialized in a separate file that
> must come with the file it refers to.

Yes.

I don't think it's a good idea to store the data-file and the xattrs
together in a single file. This would break all known applications on
earth. It's simply not viable.

Also, consider that xattrs may contain binary data per se, so the format
should handle binary data by for instance escaping (if the format should
be human-readable). I suppose it should also default to UTF-8 for text
strings, so that these characters are not escaped as binary data. For a
binary format, it is enough to prefix the data with its length and dump
the raw binary data into the file. This is not as nice for users who want
to inspect the file by hand, though.

What would the separate file called? Should there be one metadata-file for
each regular file (that contains metadata)? How about simply appending a
".metadata" extension? It is also possible to prefix with a dot to hide
this from most applications, but I don't know if this is desirable. The
library could check both when reading, but write out the file with a dot
prefixed or not based on user preference.

> So that specification would be implemented by unix commands like cp or
> filemanagers. Then it would permit us to use extended attributes
> knowing that they would be preserved and reliable.

This is the hardest part. You simply cannot hope to implement xattrs in
all programs, and so they might easily get lost silently anywhere along
the way (imagine downloading a file (that HAS xattrs) to your harddisk,
then copying it to a FAT-formatted pen-drive, opening it on an older Linux
distribution, etc.).

Though, for this reason, no program should ever RELY on xattr metadata
being present. And no valuable data should be stored as xattrs. Xattrs
should be considered volatile.

I think, for the best possibility to preserve (or simply use) xattrs in as
many programs as possible, there has to be a repository of patches for
programs. A lot of program authors most likely won't add support for
xattrs or serialized-xattrs to their "master" program just like that,
especially if it relies on external libraries or even simply
Linux-specific system calls (ah, another reason to make a library
wrapper).

> Also, what about using extended attributes to cache the guessed file
> type (guessed using libmagic and extension). Then when a filemanager or
> any other application want to know the file type, they will use this
> value instead of using guessing another time.

That is quite possible. Also, in many cases, you KNOW the type/encoding of
a file (the HTTP server always sends this information, although it *might*
be a guess, too), even though there is no way to save this to disk along
with the file.

> I also thought that extended attributes could store the size of a
> directory. For example when I want to scan the filesystem with
> utilities like du or Baobab, they would store in extended attributes
> the directory size along with the time when it was measured.
> Afterwards, when they want to know the directory size, they could just
> compare the date in the extended attribute with the modification date
> of the folder and if they differs, measure the directory size. If not,
> they could just use it directly.

I think this would be hard to make work reliably in reality. Do all
filesystems update the modification date of all parent directories when a
file changes? Although, yes, it would be great to have this. It takes ages
to scan a whole disk like this normally.

> I think the extended attributes should be better integrated in the
> desktop. These are just few ideas. What do you think about it ?

It's great. We need ideas like these. Keep them coming! :-D

> Mildred

Vegard



More information about the xdg mailing list