[RFC] Metadata access and storage

Thu Sep 8 01:55:45 PDT 2011

Den 08-09-2011 07:18, Jürg Billeter skrev:
> On Thu, 2011-09-08 at 01:04 +0200, Anders Feder wrote:
>> Den 07-09-2011 23:17, Jürg Billeter skrev:
>>> I'm currently favoring a more modular approach where we define a core
>>> storage API that is based on the RDF model but is kept much simpler.
>>> That is, I would no longer use SPARQL (or any other query language) on
>>> the lowest level and instead provide a simple CRUD API on the level of
>>> RDF resources.
>> Cool, that is very much what I've been aiming for - though in the
>> reverse direction. What I propose is capturing your use case (E-D-S
>> pushing CRUDs to Tracker) and my use case (Tracker pulling triples from
>> E-D-S) in a single, backend agnostic interface. Then one can apply this
>> interface to Tracker, E-D-S, Soprano or whatever backend one sees fit to
>> achieve compatibility with the rest of the desktop.
>>
>> Here are the primitives I've been operating with:
>>
>> InsertStatement (triple)
>> MatchStatements (pattern)
>> DeleteStatement (triple)
>>
>> MatchStatements returns solutions satisfying a basic graph pattern, the
>> other two are self-explanatory.
>>
>> What primitives do you imagine would be needed? Are the three above
>> low-level enough?
> Basic graph patterns can be very complex if you allow arbitrary SPARQL
> filters. I'd say, that's not something you want to implement as part of
> your application's own data store. I'd start with something like this:

I wouldn't require applications to implement basic graph patterns in 
full (though it would be useful for some). Sequences of triple patterns 
should suffice for most things.

>
> # Insert/replace statements (predicate-object pairs) about uri (subject)
> InsertResource (uri, statements)
> # Delete all statements about uri
> DeleteResource (uri)
> # Return all statements about uri
> GetResource (uri)
> # Return all resources that have been touched since specified point
> GetUpdates (since)

What I would do is map the first three above on the API level (as Carlos 
suggested) to the three I suggested on the transport level. This way, 
greater granularity is available on the transport level (you can 
insert/delete individual triples), but the application developer is 
shielded from the complexity of it on the API level.

>
> The last point is important as it allows application databases/indices
> to be kept in sync. The interface should also provide at least a very
> limited form of transactions.
>
> Can you describe your use case in more detail? If I understand you
> correctly, you'd like an interface that could be used by an
> application-independent Tracker miner. However, I don't see how you
> could keep Tracker uptodate with the above interface without retrieving
> all data with MatchStatements on every sync - unless you're saying that
> the signals you describe on semantk.org should be used for incremental
> updates. The latter would imply that you have to guarantee that the
> Tracker miner is running whenever the application is running (to not
> miss any signals).

I had assumed the miner would query for changes when it is loaded and 
act on signals while it is running. (Is this not what miners normally do 
(indexing)?) You're right that it would make sense to amend the 
interface to support "GetUpdates (since)"-type operations though.

> Wouldn't it be much easier to push data from the application to the RDF
> store as in my proposal? Or in other words, what advantages are you
> seeing by encouraging applications to provide a generic pull interface
> instead of encouraging applications to push their data to a service
> using a generic interface?

There are some situations (outside the immediate scope of Tracker) where 
push is not ideal. I give the example of 'location' under 'Use cases 
<http://semantk.org/use-cases.php>'.

Another example might be DBpedia <http://dbpedia.org/About>. Services 
like this are very relevant sources of information for the desktop, but 
it doesn't make sense to push their whole database into one's local 
Tracker store.

Push is good for information that has to do with the past (e.g. "I 
received an e-mail yesterday") but it neglects the whole realm of 
information that has to do with the present (e.g. "My current location 
is 12.345 67.890").

Another disadvantage is that it duplicates information (e.g. one entry 
in E-D-S and one in Tracker).

Anders Feder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/xdg/attachments/20110908/07b596d1/attachment.htm>