Common spec/interface for file metadata

Benedikt Meurer benny at xfce.org
Mon Sep 5 18:30:30 EEST 2005


Jamie McCracken wrote:
>> This would still be a performance problem for fast file managers, and it
>> would cause unnecessary load on the metadata implementation. Think of a
>> medium-size folder (around 1000 files). When the file manager enters the
>> directory it can display up to 50 files at once, and so it doesn't need
>> to know the metadata for the other 950 files until the user scrolls down
>> to the last file (slow scrolling in this case, so every file's view
>> item/row receives an expose event). Nevertheless, the "metadata daemon"
>> would need to fetch the data for all 1000 files and transfer them, even
>> tho the file manager needs only 5% of them.
> 
> shouldn't be that bad for btree based databases as they burst read the
> file's metadata anyhow. For retrieval of any metadata it can be
> asynchronous so it shouldn't slow anything down at all.

Even if the data lookup is fast enough (which is a really optimistic
estimate), you'll still need to transfer the request from the file
manager to the daemon and send the answer back to the file manager (and
the answer for 1000 files, even if it only includes the URI and the
MimeType, can be quite large). And, once the answer is received, the
file manager will need to go through it's list of files and associate
the received data with the files, which also takes time (though that's
not the bottleneck).

> But what metadata would you need when loading a directory? The only one
> I can think of is MimeType perhaps which is currently async in Nautilus.

The file manager doesn't need to ask a "metadata system" for the MIME
type, but can figure out the MIME type itself (usually way faster than
an RPC, as in 90% of all cases the MIME-type can be determined from the
file's name).

If there's nothing else besides the MIME-type that can be queried for a
file and displayed in a detailed list view (or an extended icon view),
then the file manager won't need to access the "metadata system" at all.
The properties dialog can - if required - load a plugin that does the
metadata query on-demand.

>> What's required from a file managers POV is a fast way to lookup the
>> meta data available for a certain URI w/o much overhead (e.g. w/o any
>> RPCs). Perhaps an mmap()able file or an SQLite database. Or - for the
>> brave - store it in the extattrs of the file (tho this is probably not
>> the way to go as some file systems limit the size of the data stored
>> within the extended attributes of a file).
> 
> The problem is then how do you take advantage of existing frameworks
> under developemnt like Beagle, Kat and Tenor. They are the ones that are
> producing the metadata in the first place and they each store it in
> their own databases (Beagle uses lucene's DB, Kat Sqlite3, Tenor
> Postgres and my own implementation of a metadata framework will use the
> embedded mysql lib) so its kind of difficult to standardise it
> in-process plus you also have a potentially huge dependency list which
> no platform would accept in its core. Unfortunately I cant see any other
> way round this but IPC.

IMHO, the proper way to address this issue is to standardize the
metadata backend instead of the interface. If everybody would use SQLite
(for example), then it would be easy to access the metadata from every
application without the need to do any IPC. And it wouldn't matter if
the metadata index was generated by Beagle, Kat, or whatever other
systems might exist today. You would have one single metadata storage,
and every application that wants to access the metadata can simply link
to SQLite and read from the database (it might be a good idea to provide
a simple wrapper library with get_meta_data(), etc.). I don't care if
it's SQLite, Berkley DB or whatever, but it's IMHO important to have ONE
easy to use and fast metadata backend, not one backend per vendor.

Benedikt



More information about the xdg mailing list