[Xesam] Metadata Storage Daemon

Wed Jan 16 13:16:41 PST 2008

On 16/01/2008, Sebastian Trüg <strueg at mandriva.com> wrote:
> On Wednesday 16 January 2008 10:21:18 Kevin Kubasik wrote:
> > OK, well the obvious agreement is a need for time/change tracking, I
> > added a dbus signal called on inserts and a method to get all new
> > triples since a specified timestamp. As for file monitoring, while a
> > Gnome-wide service would be nice, I think that it is outside the scope
> > of a metadata daemon (personally, open to more discussion on this).
> >
> > I think that a rudimentary triple store (roughly like what I have
> > produced here) is a great _base_ for what we are all more or less
> > talking about. I think that the pushes for more searching/indexing
> > capabilities of the data here are missing the point, this is more a
> > simple storage engine. Powerful desktop search engines like Beagle and
> > Tracker can now both index the same stored metadata.
>
> IMHO the indexing should be part of the store. And the search engines should
> then use the store to query the data. Thus, we would have these components:
>
> * Indexer (or better: analyser)
>   analyses files and writes the data into the store
>
> * Store
>   Simple data store for triples (or quadruples) with a proper RDF API (like
>   Soprano fx ;) for advanced queries and a simpler API to perform stuff like:
>   - getAllProperties( uri resource )
>   - setProperty( uri resource, uri property, value )
>   and so forth which handle time stamping and meta-meta-data updating
>   automatically.
>   This store also indexes the data and provides a query API which can be used
>   by search engines. This query API is low level and not intended for the
>   end user (I would opt for SPARQL here but I think you know that ;)
>
> * Search client
>   Creates queries to the store from user queries.
>   (This is what has been described already in XesamQueryLanguage)
>   "Final" search clients would then be using this service for queries.
>   Thus, searching means three steps:
>      user GUI -> search client service -> Store
>
> * File watch service
>   Watches file systems for changes and updates the metadata accordingly.
>
> I think it is important to keep the data in one place here. There is no reason
> to keep separate stores and indexes for data from file analysers and from
> user input (like tags) or any other application that likes to store
> something.

I think this Unholy Quartet of Desktop Metadata (UQDM) is spot on -
except for some details on the roles each piece should play (yes,
there is a far fetched pun buried here).

 * Indexer
I really think we should have an external index with extracted data
separate from user generated metadata - as Jamie also notes elsewhere
in this thread. One should be able to nuke the index from the CLI and
not loose one single bit of metadata.

One could have an fifth role of Analyzer/Crawler that feeds extracted
data to the indexer.

 * Store
Keeps the holy bits secure. Applications feed user data here. User
data defined as being implicitly or explicitly derived from user
actions. The indexer indexes the contents of this base. The storage
only has rudimentary query support.

One could add an additional dbus interface to the store adding richer
sparql query support.

The Store should be rock solid and be easy to back up. Some people are
going to spend countless hours annotating their huge music and/or
photo collections. If they loose their data just once, we loose them
as users forever.

On the long term we might want to have a shared backup format. Fx XML
or something (huge file, yes, but compresses really good).

 * Searcher
Exposes a Xesam Search API into the index generated by the Indexer.
This is much what we have now.

 * File Watch Service
A beer to the first one who provides me a fast, efficient, and light
api that lets me do the following without any weird gotchas:

  watcher = new Watcher ("~/");
  watcher.connect ("files-changed", on_files_changed_callback);

 * Implementation note:
Since I assume we are all talking dbus apis it really does not matter
from which process these APIs are exposed. Furthermore I don't think
that it is a thing the Xesam spec should enforce.

If one implements several of these interfaces in one process it could
be easy for the daemon to check "Who owns org.freedesktop.xesam.index,
uh, that is me. I might ass well just short circuit dbus an make
native method calls".

 * Conclusion
I think I might actually be more in favor of an Unholy Quintet:
Analyzer, Index, Store, Searcher, Watcher.

Cheers,
Mikkel