Common spec/interface for file metadata

Mon Sep 5 17:22:42 EEST 2005

Benedikt Meurer wrote:
> Jamie McCracken wrote:
> 
>>>While such an API is simple and works for some apps its certainly not
>>>gonna be usable for applications like file managers or indexers. Calling
>>>a separate out of process RPC for each file while loading a directory
>>>would totally kill performance. 
>>
>>If it were needed to be called on each file while loading a directory
>>(which would only be for limited things like the mime type) then we
>>could also have an API to return a specific metadata for each file in a
>>directory in one shot.
>>
>>EG
>>
>>GetMimeTypesForFilesInFolder
>>  input DBUS_TYPE_STRING s (the folder uri)
>>  output DBUS_TYPE_DICT  a{ss} (the metadata as filename, mimetype)
>>
>>Most of the other metadata would be retrieved on demand by users
>>requesting to see additional metadata for a file so the previous API
>>should suffice for that.
> 
> 
> This would still be a performance problem for fast file managers, and it
> would cause unnecessary load on the metadata implementation. Think of a
> medium-size folder (around 1000 files). When the file manager enters the
> directory it can display up to 50 files at once, and so it doesn't need
> to know the metadata for the other 950 files until the user scrolls down
> to the last file (slow scrolling in this case, so every file's view
> item/row receives an expose event). Nevertheless, the "metadata daemon"
> would need to fetch the data for all 1000 files and transfer them, even
> tho the file manager needs only 5% of them.

shouldn't be that bad for btree based databases as they burst read the 
file's metadata anyhow. For retrieval of any metadata it can be 
asynchronous so it shouldn't slow anything down at all.

But what metadata would you need when loading a directory? The only one 
I can think of is MimeType perhaps which is currently async in Nautilus.

I figured that metadata as a whole would only be retrieved with a 
"properties" dialog  (as it is in the case of the current Nautilus) so 
theres no need to pull down all metadata for all files when reading in a 
directory.

> 
> What's required from a file managers POV is a fast way to lookup the
> meta data available for a certain URI w/o much overhead (e.g. w/o any
> RPCs). Perhaps an mmap()able file or an SQLite database. Or - for the
> brave - store it in the extattrs of the file (tho this is probably not
> the way to go as some file systems limit the size of the data stored
> within the extended attributes of a file).

The problem is then how do you take advantage of existing frameworks 
under developemnt like Beagle, Kat and Tenor. They are the ones that are 
producing the metadata in the first place and they each store it in 
their own databases (Beagle uses lucene's DB, Kat Sqlite3, Tenor 
Postgres and my own implementation of a metadata framework will use the 
embedded mysql lib) so its kind of difficult to standardise it 
in-process plus you also have a potentially huge dependency list which 
no platform would accept in its core. Unfortunately I cant see any other 
way round this but IPC.

> 
> Benedikt
> 
> 

-- 
Mr Jamie McCracken
http://www.advogato.org/person/jamiemcc/