[Xesam] Practical applicability issues

Wed Jun 18 14:23:30 PDT 2008

2008/6/18 Evgeny Egorochkin <phreedom.stdin at gmail.com>:
> Hi all.
>
> I'd like to raise again a long-standing issue of structure support.
>
> Some of missing features that require structure support are actually being
> _requested by potential users_(who are encouraged to chime in ;-) :
>
> * Geo tagging and geo information for addresses;
>
> * Status and presence information for contacts;
>
> * Adequate support of iCalendar spec, which is currently more or less
> nonexistent.
>
> Needless to say support means not only storage and retrieval, but
> filters/query conditions.
>
> This can't be resolved merely by introduction of some string format for a
> field that represents structured data, since eg useful features like geo
> proximity search won't work..
>
> I consider this to be a blocker issue that significantly limits applicability
> of xesam and interoperability with platforms that don't have structure
> support limitations.

Hi all,

A longish mail. Let me try to give you my understanding of the
problem, its implications, some technical notes, and with all these
things cleared up I will give you my detailed opinion on this matter.
Let me first state that I am a bit torn on this issue. On one side I
can see that we have some impractical limitations, and on the other
side we already have implementations of Xesam and a stability promise
to keep. This is an important issue, so please hang on.

No matter if we include this or not, it is important for all to
understand the implications. So I urge you to discuss this properly.

EXPLANATION:
I think it would be useful with some more explanation of the problem.
Say we want to add geo tagging to the address(es) of the Person class.
Currently Person has the address-related fields:

workPostalAddress (list of strings)
homePostalAddress (list of strings)

Adding geo info: workGeoN, workGeoE, homeGeoN, homeGeoE. You can see
where this is going, from 2 to 6 fields, and Person addresses is not
the only place where this scheme plays out. Assume we instead define a
struct (which Xesam currently does not support) called Address with
the following fields:

postalAddress
geoN
geoE

Then person would be much simpler and structured (pun unavoidable):

workAdress (struct Address)
homeAddress (struct Address)

COMPATIBILITY
This will add a new data type to the ontology, which will of course
break backwards compat. It is not a big thing for clients since they
will now just receive a dbus struct and then have to access a given
member of that to get at the data they did before. Server side will
have to do a little more work, but not necessarily a big deal of it,
depending on how they choose to implement structs.

We need to be able to query struct members also. This can luckily be
done in a backwards compatible way (unless someone emplyes a very
strict parser). The simplest solution is to add a 'member' attribute
to the 'field' element. That way you can specify:

<contains>
  <field name="workAddress" member="postalAddress"/>
</contains>

Leaving out the 'member' attribute would mean all struct members. This
would not work on nested structs though (should we decide to support
those too).

TECHNICAL
I had several technical concerns when I first started thinking about
this. But as you will see I think all of them are solvable.

The first was how to implement structs in a flat field store (like fx
Lucene). This is not that tricky. For each struct member in each
category the following field name will be unique:

  <Cat.name>_<Structname>_<fieldname>

Fx:

  Person_workAddress_postalAddress

Then if a query should look at a struct field simply do query
expansion like you would on our current hierarchy of fields. When a
struct field should be retrieved the server would have to collect the
relevant fields and roll them into a real dbus struct.

Next problem is our hierarchical query matching (ie that a query in
xesam:author should also match in child fields of xesam:author), it
could be that structs would somehow mess this up. This is also a
non-issue as it turns out. If we require that struct members are also
full fledged fields, fx continuing our example the postalAddress
member of the Address struct would be a child of
xesam:physicalAddress, we would have fully normal query expansion
given a query into a struct field (like the query fragment above).

Next technical problem - nested structs and lists of structs. While
nested structs should work fine as I outline above, I see no clear
solution in the query language. So I would probably not recommend
this. Then about lists of structs - some implementations, like Tracker
I believe, store lists in a different way than normal fields. This
might make it tricky to support lists of structs (which we would like
if we want to allow multiple home addresses per person).

Last issue I've thought about - hit data retrieval. To keep it short I
don't think we should change the API or session props. This means that
naming a struct field in hit.fields will always retrieve the entire
struct. Retrieval of individual struct members is hence not possible.

PROS
 * Some people close to the project has voiced concern about the
complexity and size of the ontology. Structs might induce some more
order and coherence.
 * Able to be data-compatible with other de-facto standards. Namely vcard
 * Can be added without too much work (it is not relation traversing
queries we talk about!)

CONS
 * Break ontology (a fair bit) and query language (very little)
 * Jeopardize project credibility by changing low level stuff like this in a RC
 * More work

MY OPINION
As stated when I started I am greatly concerned about breaking
anything at this point. OTOH there is a reason why it is a RC and not
1.0. For me to give +1 it would require:

 * We can add it without causing to much work
 * All server maintainers give +1
 * It has absolute minimal impact on the API
 * We document meticulously what changed and how consumers should react
 * It addresses (almost) all issues Evgeny has raised

I think that what I've outlined above meets these, but I can very well
have missed something. What I am most unsure about is whether this
model actually solves our issues and if people are going to have
troubles implementing lists of structs.

Cheers,
Mikkel