[Xesam] Iteration two, metadata api

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Thu Dec 4 07:30:10 PST 2008


2008/12/4 Philip Van Hoof <spam at pvanhoof.be>:
> The current metadata storage API has a few problems that we discovered
> while implementing the Turtle file support for Tracker.

Uh, good with some review on this then! :-)

> In a way is inserting metadata via a Turtle file the same thing, with a
> difference being that instead of over IPC the data comes from libraptor.
>
> Nonetheless, we are inserting RDF triples, which is the same as what the
> metadata storage API is promising to service ... over IPC (the feature
> of letting you as an external app insert triples about an existing or
> about a new resource).
>
> For this it was needed that per group of triple that we insert, we know
> the resource's URI, the rdf:type and the File:Modified predicate.
>
> The API as proposed at http://xesam.org/main/XesamMetadataAPI does not
> require that rdf:type and File:Modified are passed as fields.

The reason that "rdf:type", which is equivalent to Content category
(or Source category?) in Xesam, is not passed in is that it is
implicit. The idea behind the API was that you can only set metadata
on an existing object, that is, a URL that has already a content and
source category.

The idea behind not passing an mtime is that the API consumer knows
what she is doing. But you are right that it may be possible that the
metadata storage is already one step ahead... Which may or may not be
a problem depending if the API consumer is quick to catch up.

> This means that it would be unimplementable for many Xesam implementers.
> Especially the Xesam implementers using a (decomposed) triple store.
> They'll need the rdf:type to know which table to elect for the insertion
> of the triple.
>
> The File:Modified is needed for collision handling: what if a record
> already exists? How do you know that what is being proposed by the user
> of the XesamMetadataAPI for insertion is more recent than what you
> already had?

I can see the problem, but assuming a timestamp on each and every
triple also seems like a lot of overhead, maybe its not a problem...
Or am I missing something?

> For example:
>
> Set (<maildir://folder/UID001>, ["Message:IsRead"], ["True"])
>
> How do you know that the caller of Set is the most recent?
>
> What you need instead is (something like) this:
>
> Set (<maildir://folder/UID001>,
>     "rdf:type", "Message",
>     "Resource:Modified", time(),
>     ["Message:IsRead"], ["True"])
> or
>
> Set (<maildir://folder/UID001>, ["rdf:type", "Message:IsRead"],
>     [time(), "True"]);
>
> But for the last one rdf:type and Message:IsRead would be required
> fields (predicates).

Ok, so there are two items in play here. Timestamps and whether or not
to pass in the content/source categories of the uri in question.

Regarding timestamps I think that you are on to something; buggy
clients may screw things up. I am not sure that adding timestamps to
the API will fix it though... Will the client not just set the
timestamp to the system time just before it submits the metadata? I
know that they should set the timestamps correctly, but we can't
assume this. At least messages on the bus are received in order, so we
have a minimal level of sanity...

On the rdf:type/content cat. thing it is really about the way you
perceive the API as far as I can see. If you want to be able to define
metadata on any old uri (without caring whether the object "exists")
you need to pass in content and source categories, but if you only
allow to edit metadata on stuff that is already known then you don't
need them. By "exists" I mean that at one opint in time Create() or
CreateMany() was used to register the object with the server.

I am not anal about any of the two approaches. If we go with the
implicit-objects-approach, which is the one you appear to be assuming,
then we should drop the Create() and CreateMany() methods (and we
still miss a way to "delete" objects, as mentioned in the
Considerations section of the draft).

If I recall correctly from the hackfest Evgeny was also in favor of
implicit-objects... Ev?

-- 
Cheers,
Mikkel


More information about the Xesam mailing list