[Xesam] Metadata Storage Daemon

Sun Jan 13 03:07:25 PST 2008

В сообщении от Sunday 13 January 2008 12:12:34 Mikkel Kamstrup Erlandsen 
написал(а):
> On 13/01/2008, Evgeny Egorochkin <phreedom.stdin at gmail.com> wrote:
> > В сообщении от Saturday 12 January 2008 23:02:18 Mikkel Kamstrup
> > Erlandsen
> >
> > написал(а):
> > > On 12/01/2008, Evgeny Egorochkin <phreedom.stdin at gmail.com> wrote:
> > > > В сообщении от Saturday 12 January 2008 01:05:38 Mikkel Kamstrup
> > > > Erlandsen
> > > >
> > > > написал(а):
> > > > > On 11/01/2008, Sebastian Trüg <strueg at mandriva.com> wrote:
> > > > > > Just my 2 cents:
> > > > > > Soprano has a IMOH very good DBus API [1] for RDF storage which
> > > > > > fulfills all 3 of your requirements below. We already use it for
> > > > > > Nepomuk and it works great. And since Xesam is already using URIs
> > > > > > to identify stuff why not go the extra mile to RDF storage
> > > > > > altogether?
> > > > >
> > > > > I thought Soprano depended on Qt?
> > > >
> > > > This is not a dependency that you can't easily get rid of.
> > > >
> > > > > Anyways, I don't think the RDF quadruples is a good thing to expose
> > > > > directly to the programmers who just want a quick and dirty
> > > > > metadata storage. It is simply just too technical. That does not
> > > > > mean that we cannot use that stuff under the hood though.
> > > >
> > > > Which part of ( URI, property name, property value , timestamp )
> > > > programmers can't understand and why should it be hidden?
> > >
> > > Exactly my point :-)  ( URI, property name, property value , timestamp
> > > ) is fine, but exposing the general Named Graph terminology (and
> > > features)
> >
> > Actually there's nothing more to named graphs than another element added
> > to the triple. So you can differentiate named graphs with namespacing
> > like mtime:/ uri. Using name graphs only for mtime might backfire in the
> > sense that named graphs could be used in other ways like to store
> > provenance info(where speicifc triple came from).
>
> That is exactly one problem I have with named graphs. It seems kind of
> arbitrary to allow exactly one "name" per graph. It kind of begs you
> to stick on XML blob in it with both mtime, provenenance, and your
> shoe size in centimeters.

You're right in the sense that quads are arbitrary. Any information 
represented with quads can be as well represented with triples, but a much 
larger number of them. So quads are basically a slight relaxation of the 
absolute minimalism of triples, which often is benefical in practice.

It is also possible to use a xml blob. The question is whether this is needed 
or not.

> > > in the API is too generic to my taste. If we say that the
> > > triple name is always a timestamp I am ok with it.
> >
> > Actually generic API is the only one that's really needed, because it is
> > the most powerful. This doesn't exclude having a set of convenience
> > functions to do typical queries or even completely hide the RDFish and
> > SPARQLish nature of the matter for certain users of the technology.
>
> The most powerful and generic API is not always the right one to
> expose. You have to design the API so that the consuming programs also
> get a lot of expressive power and clarity. That is rarely a quality of
> totally generic interfaces.

I didn't say you shouldn't expose simpler/specialized apis. I said it makes no 
sense to intentionally disallow access to the most generic API.

> Consider the following lines of code could be the same:
>
> double val = item.getValue();
> double size = shoe.getShoeSize();

However for a code that deals with T-shirts, item.getValue() is better than 
shoe.getShoeSize().

> If you are writing stuff that should really be generic (ie a generic
> RDF backend) then 1 is fine. If you are writing an application to
> manage a shoe store 2 would likely make things a lot clearer. Ofcourse
> this example is exaggerated, but I think the idea is clear.

If your application is managing a shoe store but uses RDF as a backend, it's 
unreasonable to completely disallow the access to RDF interface.

> Our target is "Metadata Storage for the Desktop". Not "Generic Named
> Graph Storage" and our API should reflect this.

Metadata Storage for the Desktop is just as ambiguous as a Generic RDF 
Backend.

What metadata are we talking about? Settings? Annotations? How about external 
data sources like IMAP or a FOAF provided by a social networking site? Where 
do we draw the line and should we?

> If absolute generic'ism was the best thing in the world all APIs would
> consist of var arg functions:
>
> Object get (Object obj, ...)
> Object set (Object obj, ...)

That's exactly what template metaprogramming and similar approaches are doing, 
with some success mind you.

As to specific/niche APIs, ontology is exactly the solution for this. RDF 
backend is generic, but specific ontology consisting of classes and 
properties provides human-friendly solutions for specific problems.

If you use a generic RDF backend doens't mean you really have to operate in 
RDF realm. You still can use constructs like: 

int length = aMessage.getLength()

Unless your programming language can't feasibly support such constructs of 
course.

Soprano is a generic RDF backend, still you get to manipulate high-level 
objects defined by an ontology and not RDF triples or anything low level like 
this(unless you really want to). 

Yet Soprano is not tied to any specific ontology, API or approach. Switching 
say from FOAF to Xesam is a matter of providing an ontology description file, 
which none of concern for Soprano devs and other users of the system.

-- Evgeny