[Xesam] Metadata Storage Daemon

Fri Jan 11 14:54:07 PST 2008

On 11/01/2008, Kevin Kubasik <kevin at kubasik.net> wrote:
> Alright, so there was a quick chat in the IRC last night where a few
> of us realized that we wanted a simple metadata storage implementation
> to try and centralize whats going on all crazy-like with several
> different daemons all coming from completely different directions. My
> rough proposal is really 2 part
> 1) A simple metadata storage service over dbus would be quite simple,
> obviously better API's cost us more time and energy, but the backbone
> of such a system is extremely rudimentary. I propose that we just go
> ahead and write one. No desktop search or filters etc. Just a few
> calls exposed to dbus to store, query and delete Triples (A Combo of
> some uniqueid, data, and the datatype/metadata). At its core this is a
> sqlite db with a little extra work.

Yes, and kudos for providing one! Good idea to use an ORM, why didn't
*I* think of that? :-) An easily hackable foundation with quite a lot
of functionality actually.

> 2) We take what we learn from the simple implementation and build it
> into a Xesam spec for metadata storage. As well as building an
> 'official' Gnome ontology.

Xesam has already put a lot of work into creating a desktop ontology,
as has Nepomuk (with a more advanced one). I think it would be foolish
to duplicate that work, especially since it has proven quite difficult
to create a *good* ontology.

As already pointed out elsewhere in this thread, there are some very
loose scribblings on http://xesam.org/main/XesamIteration2 about a
metadata interface. Feel free to put your thoughts there...

> While the strength of the current Xesam Query spec is a great
> indicator of how planning can design a wonderful system, I think
> metadata is slightly different. Any true store (that reaches the
> universal acceptance needed for ubiquity) needs to be generic, _any_
> metadata about _any_ source, with social rules governing where and how
> data is labeled. Since I had about an hour to kill this evening, I
> sloshed together some python to outline what I am getting at. The
> hodgepodge system I see as most prudent would handle an MP3 as follows

Agreed. A general RDF store (similar to what you provided) certainly
qualifies as generic in my book.

The current idea is to stick with the xesam onto for starters and then
allow custom ontology extensions.

> We throw in basic timestamping of all actions and I think we have 90%
> of the desktops metadata storage needs covered. The best part is that
> the footprint would be minuscule, and the code relatively stable.
> While a query system that supports wildcards etc would probably we way
> better, I more just wanted the idea to show. I used SQLObject since it
> makes life painless and I wanted to finish both this e-mail and the
> sample code in under an hour. Combined with proper namespacing of
> applications etc. This is all we really need at the core (maybe a few
> more columns or indexies). Anyways,  Please share API thoughts so we
> can at least pick a general direction. I would be really interested to
> know a little more about the more elaborate potential use cases.
> Honestly, I see 80% of use being:
> 1) Add lots of attributes for a Uri
> 2) Query for all attributes associated with a Uri or Query for a
> specific attribute associated with a Uri
> 3) Query for Uri sets that have a certain value in a certain
> attribute. * (This starts to venture into the realm of our indexers
> obviously this is a regular use case, and we would need it plenty, I'm
> just noting that any spec we try to make from this should probably
> _count_ on the other desktop searches indexing their metadata, so we
> really just filter on them.)

There is going to be a small overlap of functionality between the
indexer and the storage, but that is no harm I think. The storage
should support very simple select-style methods (along the lines you
present in your python demo).

Having only simple querying mechanisms on the storage should suffice
if we assume that there is an indexer available indexing the storage
(exposing a Xesam Search API).

> Anyways, the blob of silly test code is in a bzr brach at
> http://kubasik.net/dev/metadata_daemon.kkubasik/
> so feel free to bzr branch
> http://kubasik.net/dev/metadata_daemon.kkubasik/ away.
>
> I know this isn't at all near a full implementation or spec, but I did
> want to get the ball rolling on it, as it seems like a lot of people
> agree that an ultra-discreet (and part of Gnome proper) system for
> storing and querying metadata is in the near future.

I have one question that I have not really thought through myself yet...

Should the storage monitor files? Fx Should the store drop data on
files that are deleted? This makes good sense if we talk about
tagging, since I don't want to list files that had tag "foo" back when
they existed (apps could do a stat-dance to check existence, but i
would rather avoid that).

Or more generally updating internal structures if parts of graphs are
modified in a way that must be propagated for the layout to remain
consistent (fx clean up dead links[1]).

Cheers,
Mikkel

[1]: The storage could know from the onto which triplets are links.