[RFC] Metadata access and storage

Wed Sep 7 22:18:41 PDT 2011

On Thu, 2011-09-08 at 01:04 +0200, Anders Feder wrote:
> Den 07-09-2011 23:17, Jürg Billeter skrev:
> > I'm currently favoring a more modular approach where we define a core 
> > storage API that is based on the RDF model but is kept much simpler. 
> > That is, I would no longer use SPARQL (or any other query language) on 
> > the lowest level and instead provide a simple CRUD API on the level of 
> > RDF resources.
> Cool, that is very much what I've been aiming for - though in the 
> reverse direction. What I propose is capturing your use case (E-D-S 
> pushing CRUDs to Tracker) and my use case (Tracker pulling triples from 
> E-D-S) in a single, backend agnostic interface. Then one can apply this 
> interface to Tracker, E-D-S, Soprano or whatever backend one sees fit to 
> achieve compatibility with the rest of the desktop.
>
> Here are the primitives I've been operating with:
> 
> InsertStatement (triple)
> MatchStatements (pattern)
> DeleteStatement (triple)
> 
> MatchStatements returns solutions satisfying a basic graph pattern, the 
> other two are self-explanatory.
> 
> What primitives do you imagine would be needed? Are the three above 
> low-level enough?

Basic graph patterns can be very complex if you allow arbitrary SPARQL
filters. I'd say, that's not something you want to implement as part of
your application's own data store. I'd start with something like this:

# Insert/replace statements (predicate-object pairs) about uri (subject)
InsertResource (uri, statements)
# Delete all statements about uri
DeleteResource (uri)
# Return all statements about uri
GetResource (uri)
# Return all resources that have been touched since specified point
GetUpdates (since)

The last point is important as it allows application databases/indices
to be kept in sync. The interface should also provide at least a very
limited form of transactions.

Can you describe your use case in more detail? If I understand you
correctly, you'd like an interface that could be used by an
application-independent Tracker miner. However, I don't see how you
could keep Tracker uptodate with the above interface without retrieving
all data with MatchStatements on every sync - unless you're saying that
the signals you describe on semantk.org should be used for incremental
updates. The latter would imply that you have to guarantee that the
Tracker miner is running whenever the application is running (to not
miss any signals).

Wouldn't it be much easier to push data from the application to the RDF
store as in my proposal? Or in other words, what advantages are you
seeing by encouraging applications to provide a generic pull interface
instead of encouraging applications to push their data to a service
using a generic interface?

Jürg