[Xesam] Metadata Storage Daemon

Thu Jan 17 01:34:38 PST 2008

On Thursday 17 January 2008 10:07:37 Evgeny Egorochkin wrote:
> В сообщении от Wednesday 16 January 2008 15:19:52 Jamie McCracken 
написал(а):
> > On Wed, 2008-01-16 at 14:02 +0100, Sebastian Trüg wrote:
> > > On Wednesday 16 January 2008 10:21:18 Kevin Kubasik wrote:
> > > > OK, well the obvious agreement is a need for time/change tracking, I
> > > > added a dbus signal called on inserts and a method to get all new
> > > > triples since a specified timestamp. As for file monitoring, while a
> > > > Gnome-wide service would be nice, I think that it is outside the
> > > > scope of a metadata daemon (personally, open to more discussion on
> > > > this).
> > > >
> > > > I think that a rudimentary triple store (roughly like what I have
> > > > produced here) is a great _base_ for what we are all more or less
> > > > talking about. I think that the pushes for more searching/indexing
> > > > capabilities of the data here are missing the point, this is more a
> > > > simple storage engine. Powerful desktop search engines like Beagle
> > > > and Tracker can now both index the same stored metadata.
> > >
> > > IMHO the indexing should be part of the store. And the search engines
> > > should then use the store to query the data. Thus, we would have these
> > > components:
> > >
> > > * Indexer (or better: analyser)
> > >   analyses files and writes the data into the store
> > >
> > > * Store
> > >   Simple data store for triples (or quadruples) with a proper RDF API
> > > (like Soprano fx ;) for advanced queries and a simpler API to perform
> > > stuff like: - getAllProperties( uri resource )
> > >   - setProperty( uri resource, uri property, value )
> > >   and so forth which handle time stamping and meta-meta-data updating
> > >   automatically.
> > >   This store also indexes the data and provides a query API which can
> > > be used by search engines. This query API is low level and not intended
> > > for the end user (I would opt for SPARQL here but I think you know that
> > > ;)
> > >
> > > * Search client
> > >   Creates queries to the store from user queries.
> > >   (This is what has been described already in XesamQueryLanguage)
> > >   "Final" search clients would then be using this service for queries.
> > >   Thus, searching means three steps:
> > >      user GUI -> search client service -> Store
> > >
> > > * File watch service
> > >   Watches file systems for changes and updates the metadata
> > > accordingly.
> > >
> > > I think it is important to keep the data in one place here. There is no
> > > reason to keep separate stores and indexes for data from file analysers
> > > and from user input (like tags) or any other application that likes to
> > > store something.
> >
> > agree totally (except for explicit exposure of rdf semantics/sparql)
>
> This exposure is not forced upon users in the sense that only the users of
> the system who care about this, will use such an interface. So it's a
> non-issue form the user's POV.
>
> Certainly it makes sense to have a casual coder-friendly interface. One of
> nice approaches is what is done in Soprano, but there can be others as
> well.
>
> It's probably possible to use similar approach towards query construction.
>
> I'd ask you to take a look at Soprano when you have time to see how raw
> rdf+sparql can be combined with concepts more familiar to coders without
> really losing much(if any) power.
>
> It is in fact rdf+sparql, but modelled in terms of the programming language
> you are using to interface Soprano and to me it seems rather intuitive...
>
> Something similar to this:
>
> Xesam::Document doc();
> doc.title("Job Application");
> doc.text("sdifghsdfughsdfg");
>
> Xesam::Document inherits(possibly indirectly) rdf::Resource class and
> rdf::Resource has low-level functions to set/remove/query arbitrary
> triples, like getAllProperties() and what not. but you don't use it unless
> you really know what you are doing and can't do it using higher-level
> approach.
>
> Basically you are working with objects of your programming langue with
> inheritance, inferencing etc working in an intuitive way.

Actually you are talking about the Nepomuk lib in kdelibs here, not Soprano. 
Soprano is a plain RDF API without this wrapping of RDF classes.

> > in tracker we store user/app defined metadata in a separate db but
> > sqlite allows you to construct a vitrual database which amalgamates
> > several sqlite db files to create a single virtual db. Where data is
> > stored is an implementation detail but obviously one place (or virtual
> > place) is more practical.
> >
> > Its a good idea to separate expendable metadata from the indexer and
> > precious user/app defined ones to prevent any mishaps. Alternatively
> > backing up and restoring precious data can also be used in addition to a
> > primary store.
> >
> > Having everything in one physical place with no backup is probably a bit
> > dangerous IMO
>
> You're right. Makes sense to either keep them separate or have a synced
> backup of the important part of the data.
>
> -- Evgeny