[Xesam] Metadata Storage Daemon

Thu Jan 17 00:42:15 PST 2008

On Wednesday 16 January 2008 22:16:41 Mikkel Kamstrup Erlandsen wrote:
> On 16/01/2008, Sebastian Trüg <strueg at mandriva.com> wrote:
> > On Wednesday 16 January 2008 10:21:18 Kevin Kubasik wrote:
> > > OK, well the obvious agreement is a need for time/change tracking, I
> > > added a dbus signal called on inserts and a method to get all new
> > > triples since a specified timestamp. As for file monitoring, while a
> > > Gnome-wide service would be nice, I think that it is outside the scope
> > > of a metadata daemon (personally, open to more discussion on this).
> > >
> > > I think that a rudimentary triple store (roughly like what I have
> > > produced here) is a great _base_ for what we are all more or less
> > > talking about. I think that the pushes for more searching/indexing
> > > capabilities of the data here are missing the point, this is more a
> > > simple storage engine. Powerful desktop search engines like Beagle and
> > > Tracker can now both index the same stored metadata.
> >
> > IMHO the indexing should be part of the store. And the search engines
> > should then use the store to query the data. Thus, we would have these
> > components:
> >
> > * Indexer (or better: analyser)
> >   analyses files and writes the data into the store
> >
> > * Store
> >   Simple data store for triples (or quadruples) with a proper RDF API
> > (like Soprano fx ;) for advanced queries and a simpler API to perform
> > stuff like: - getAllProperties( uri resource )
> >   - setProperty( uri resource, uri property, value )
> >   and so forth which handle time stamping and meta-meta-data updating
> >   automatically.
> >   This store also indexes the data and provides a query API which can be
> > used by search engines. This query API is low level and not intended for
> > the end user (I would opt for SPARQL here but I think you know that ;)
> >
> > * Search client
> >   Creates queries to the store from user queries.
> >   (This is what has been described already in XesamQueryLanguage)
> >   "Final" search clients would then be using this service for queries.
> >   Thus, searching means three steps:
> >      user GUI -> search client service -> Store
> >
> > * File watch service
> >   Watches file systems for changes and updates the metadata accordingly.
> >
> > I think it is important to keep the data in one place here. There is no
> > reason to keep separate stores and indexes for data from file analysers
> > and from user input (like tags) or any other application that likes to
> > store something.
>
> I think this Unholy Quartet of Desktop Metadata (UQDM) is spot on -
> except for some details on the roles each piece should play (yes,
> there is a far fetched pun buried here).
>
>  * Indexer
> I really think we should have an external index with extracted data
> separate from user generated metadata - as Jamie also notes elsewhere
> in this thread. One should be able to nuke the index from the CLI and
> not loose one single bit of metadata.

I don't really get the advantage of an external index while there are 
disadvantages like less query power: you cannot combine full text queries 
with relation queries for example.

> One could have an fifth role of Analyzer/Crawler that feeds extracted
> data to the indexer.
>
>  * Store
> Keeps the holy bits secure. Applications feed user data here. User
> data defined as being implicitly or explicitly derived from user
> actions. The indexer indexes the contents of this base. The storage
> only has rudimentary query support.
>
> One could add an additional dbus interface to the store adding richer
> sparql query support.
>
> The Store should be rock solid and be easy to back up. Some people are
> going to spend countless hours annotating their huge music and/or
> photo collections. If they loose their data just once, we loose them
> as users forever.

full ack.

> On the long term we might want to have a shared backup format. Fx XML
> or something (huge file, yes, but compresses really good).
>
>  * Searcher
> Exposes a Xesam Search API into the index generated by the Indexer.
> This is much what we have now.
>
>  * File Watch Service
> A beer to the first one who provides me a fast, efficient, and light
> api that lets me do the following without any weird gotchas:
>
>   watcher = new Watcher ("~/");
>   watcher.connect ("files-changed", on_files_changed_callback);
>
>  * Implementation note:
> Since I assume we are all talking dbus apis it really does not matter
> from which process these APIs are exposed. Furthermore I don't think
> that it is a thing the Xesam spec should enforce.

sure.

> If one implements several of these interfaces in one process it could
> be easy for the daemon to check "Who owns org.freedesktop.xesam.index,
> uh, that is me. I might ass well just short circuit dbus an make
> native method calls".
>
>  * Conclusion
> I think I might actually be more in favor of an Unholy Quintet:
> Analyzer, Index, Store, Searcher, Watcher.
>
> Cheers,
> Mikkel