[Xesam] Practical applicability issues

Wed Jun 18 14:45:43 PDT 2008

2008/6/18 Jamie McCracken <jamie.mccrack at googlemail.com>:
> On Wed, 2008-06-18 at 23:23 +0200, Mikkel Kamstrup Erlandsen wrote:
>> 2008/6/18 Evgeny Egorochkin <phreedom.stdin at gmail.com>:
>> > Hi all.
>> >
>> > I'd like to raise again a long-standing issue of structure support.
>> >
>> > Some of missing features that require structure support are actually being
>> > _requested by potential users_(who are encouraged to chime in ;-) :
>> >
>> > * Geo tagging and geo information for addresses;
>> >
>> > * Status and presence information for contacts;
>> >
>> > * Adequate support of iCalendar spec, which is currently more or less
>> > nonexistent.
>> >
>> > Needless to say support means not only storage and retrieval, but
>> > filters/query conditions.
>> >
>> > This can't be resolved merely by introduction of some string format for a
>> > field that represents structured data, since eg useful features like geo
>> > proximity search won't work..
>> >
>> > I consider this to be a blocker issue that significantly limits applicability
>> > of xesam and interoperability with platforms that don't have structure
>> > support limitations.
>>
>> Hi all,
>>
>> A longish mail. Let me try to give you my understanding of the
>> problem, its implications, some technical notes, and with all these
>> things cleared up I will give you my detailed opinion on this matter.
>> Let me first state that I am a bit torn on this issue. On one side I
>> can see that we have some impractical limitations, and on the other
>> side we already have implementations of Xesam and a stability promise
>> to keep. This is an important issue, so please hang on.
>>
>> No matter if we include this or not, it is important for all to
>> understand the implications. So I urge you to discuss this properly.
>>
>> EXPLANATION:
>> I think it would be useful with some more explanation of the problem.
>> Say we want to add geo tagging to the address(es) of the Person class.
>> Currently Person has the address-related fields:
>>
>> workPostalAddress (list of strings)
>> homePostalAddress (list of strings)
>>
>> Adding geo info: workGeoN, workGeoE, homeGeoN, homeGeoE. You can see
>> where this is going, from 2 to 6 fields, and Person addresses is not
>> the only place where this scheme plays out. Assume we instead define a
>> struct (which Xesam currently does not support) called Address with
>> the following fields:
>>
>> postalAddress
>> geoN
>> geoE
>>
>> Then person would be much simpler and structured (pun unavoidable):
>>
>> workAdress (struct Address)
>> homeAddress (struct Address)
>>
>> COMPATIBILITY
>> This will add a new data type to the ontology, which will of course
>> break backwards compat. It is not a big thing for clients since they
>> will now just receive a dbus struct and then have to access a given
>> member of that to get at the data they did before. Server side will
>> have to do a little more work, but not necessarily a big deal of it,
>> depending on how they choose to implement structs.
>>
>> We need to be able to query struct members also. This can luckily be
>> done in a backwards compatible way (unless someone emplyes a very
>> strict parser). The simplest solution is to add a 'member' attribute
>> to the 'field' element. That way you can specify:
>>
>> <contains>
>>   <field name="workAddress" member="postalAddress"/>
>> </contains>
>>
>> Leaving out the 'member' attribute would mean all struct members. This
>> would not work on nested structs though (should we decide to support
>> those too).
>>
>> TECHNICAL
>> I had several technical concerns when I first started thinking about
>> this. But as you will see I think all of them are solvable.
>>
>> The first was how to implement structs in a flat field store (like fx
>> Lucene). This is not that tricky. For each struct member in each
>> category the following field name will be unique:
>>
>>   <Cat.name>_<Structname>_<fieldname>
>>
>> Fx:
>>
>>   Person_workAddress_postalAddress
>>
>> Then if a query should look at a struct field simply do query
>> expansion like you would on our current hierarchy of fields. When a
>> struct field should be retrieved the server would have to collect the
>> relevant fields and roll them into a real dbus struct.
>>
>> Next problem is our hierarchical query matching (ie that a query in
>> xesam:author should also match in child fields of xesam:author), it
>> could be that structs would somehow mess this up. This is also a
>> non-issue as it turns out. If we require that struct members are also
>> full fledged fields, fx continuing our example the postalAddress
>> member of the Address struct would be a child of
>> xesam:physicalAddress, we would have fully normal query expansion
>> given a query into a struct field (like the query fragment above).
>>
>> Next technical problem - nested structs and lists of structs. While
>> nested structs should work fine as I outline above, I see no clear
>> solution in the query language. So I would probably not recommend
>> this. Then about lists of structs - some implementations, like Tracker
>> I believe, store lists in a different way than normal fields. This
>> might make it tricky to support lists of structs (which we would like
>> if we want to allow multiple home addresses per person).
>>
>> Last issue I've thought about - hit data retrieval. To keep it short I
>> don't think we should change the API or session props. This means that
>> naming a struct field in hit.fields will always retrieve the entire
>> struct. Retrieval of individual struct members is hence not possible.
>>
>> PROS
>>  * Some people close to the project has voiced concern about the
>> complexity and size of the ontology. Structs might induce some more
>> order and coherence.
>>  * Able to be data-compatible with other de-facto standards. Namely vcard
>>  * Can be added without too much work (it is not relation traversing
>> queries we talk about!)
>>
>> CONS
>>  * Break ontology (a fair bit) and query language (very little)
>>  * Jeopardize project credibility by changing low level stuff like this in a RC
>>  * More work
>>
>> MY OPINION
>> As stated when I started I am greatly concerned about breaking
>> anything at this point. OTOH there is a reason why it is a RC and not
>> 1.0. For me to give +1 it would require:
>>
>>  * We can add it without causing to much work
>>  * All server maintainers give +1
>>  * It has absolute minimal impact on the API
>>  * We document meticulously what changed and how consumers should react
>>  * It addresses (almost) all issues Evgeny has raised
>>
>> I think that what I've outlined above meets these, but I can very well
>> have missed something. What I am most unsure about is whether this
>> model actually solves our issues and if people are going to have
>> troubles implementing lists of structs.
>
>
> I (and tracker) dont have a problem with structs per se but nested
> structs (struct within structs) would cause complications all round in
> tracker so if we can limit it to say no nested structs then i think it
> would be ok
>
> It should be post xesam 1.0 obviously

How about lists of structs? Ie structs in your triple store.

To have any real benefit this actually needs to go in 1.0 as it would
require incompatible changes to the ontology. As an alternative
solution we could simply remove the affected fields from the 0.95
ontology and then re-introduce them as structs in 1.X for X>0 (ie ship
a reduced onto in 1.0). Either way would add incompatible changes to
the onto though.

This really all depends on how much in a rush we are for 1.0. We
already have a versioned relase out there so it may not be that big a
problem to delay 1.0 because of this. I am personally not in a rush,
but other people have other needs and we need to clear that up before
deciding anything.

Cheers,
Mikkel