[Opensync-devel] OpenSync: fragmentation is harmful

Tue Jan 4 09:04:39 UTC 2011

Hello!

Let me add the SyncEvolution list, because the technical information may
be relevant. For those who see this for the first time, it started with
an open letter that I sent to the OpenSync list asking whether it really
still makes sense to continue with two different projects instead of
focusing on one:
http://sourceforge.net/mailarchive/forum.php?thread_name=20110103221846.GA21876%40foursquare.net&forum_name=opensync-devel

I was suggesting that SyncEvolution has the better baseline to continue
from. Of course this requires further explanation, which this email is
about. It is in reply to Graham because he raised several interesting
technical questions.

On Mo, 2011-01-03 at 16:57 +0000, Graham Cobb wrote:
> On Monday 03 January 2011 13:32:52 Patrick Ohly wrote:
> > In this email I'd like to appeal to the OpenSync developers to
> > reconsider whether keeping OpenSync around really helps Linux and open
> > source syncing. 
> 
> Patrick,
> 
> Thanks for your well considered thoughts.
[...]
> > Of course I am thinking of SyncEvolution here. It already works very
> > well for SyncML. I also have support for additional protocols, which I
> > will be able to publish soon. I think it would be worthwhile successor
> > of OpenSync, but obviously I'll need help to cover all the use cases
> > that you were shooting for with OpenSync.
> 
> I do not have a good feel for what SyncEvolution can and cannot do: can you 
> provide more information?  One thing is the device access protocols, of 
> course, but even if we all joined you and helped implement those, how would 
> SyncEvolution compare with what OpenSync is intended to do?  

SyncEvolution has grown organically over time (one could call it
evolutionary...), instead of shooting for a grand design covering
everything, like OpenSync did for 0.40. The main advantage is that there
have been regular stable releases since the very beginning four years
ago. On the other hand, features for which there was no real need yet
are still missing.

There have been different phases:
     1. SyncML client for Evolution
     2. SyncML client for additional storages (iPhone, Mac OS X, file)
     3. backends contributed by external developers (Ove Kaaven: N900
        calendar, Franz Knipp/m-otion.com: XMLRPC)
     4. SyncML server (direct syncing with phones), both via Bluetooth
        and HTTP, using the Synthesis engine
     5. non-SyncML protocols

The last point is the goal for SyncEvolution 1.2, in development right
now. It still uses the Synthesis engine and everything that it provides
(data conversion, conflict handling). SyncML is also still in use, but
only as internal protocol between two peers. What a developer above the
engine sees is the storage plugin (aka data source) interface.
Conceptually such a plugin must provide:
     1. change tracking (otherwise only slow syncs work)
     2. data import/export, either in the internal Synthesis format
        (field list) or in a backend specific text format that the
        engine understands

Further references:
      * introduction to the Synthesis engine and its data conversion:
        http://syncevolution.org/development/pim-data-synchronization-why-it-so-hard
      * convenience class for a data source which has id + revision
        string for each item and exchanges data as text:
        http://meego.gitorious.com/meego-middleware/syncevolution/blobs/master/src/syncevo/TrackingSyncSource.h
      * base class with maximum freedom:
        http://meego.gitorious.com/meego-middleware/syncevolution/blobs/master/src/syncevo/SyncSource.h
      * fully functional example backend:
        http://meego.gitorious.com/meego-middleware/syncevolution/trees/master/src/backends/file
      * configuration handling:
        http://syncevolution.org/development/configuration-handling
      * communication patterns and server mode:
        http://syncevolution.org/development/direct-synchronization-aka-syncml-server
      * local sync:
        http://www.mail-archive.com/syncevolution@syncevolution.org/msg01419.html

Ove and Franz were able to implement their backends with very little
assistance, so the documentation can't be that bad, although there's no
doubt that documentation could always be better.

> For example, does it only handle pair-wise sync?  If so, what is the 
> implication of that restriction (do you have to designate one of your devices 
> as master and sync everything else to it)?

Yes, sync is always between two peers. One storage should be the
designated "master" copy of the data. Any data which cannot be stored by
that "master" will get lost. The master could be in a capable system
like EDS or Akonadi, or in the file backend, which can store anything
that the sync engine itself can handle.

A sync topology is created by defining several of these 1:1
relationships. The master itself might be the client of another server,
as long as there are no loops. There is currently no logic for keeping
several of these peers in sync, but that could be added at a meta level
(keep syncing until all changes have been distributed).

Unknown extensions are currently dropped. This could be changed, but
leads to additional questions that would need to be sorted out: should
such extensions be sent to all peers, or just the one who created them?
What if different peers have a different understanding of "X-FOOBAR"?

It is safer to limit syncing to the data that is fully understood and
modeled in the Synthesis configuration file. Currently this covers vCard
3.0 + extensions and iCalendar 2.0 (including UID + RECURRENCE-ID,
VTIMEZONE, VALARM, but not attachments).

> Does it handle devices that have bugs or limited implementations (issues like 
> capabilities and merging)?

Yes. The Synthesis engine has dealt with that for 10 years and contains
a large collection of tools that can be used to deal with such problems,
ranging from different data profiles to a full scripting language that
can modify data on-the-fly. The Synthesis engine uses capability
descriptions to determine which properties are supported by an unknown
peer and has smart merging techniques for individual properties.

For example, consider the case where a VEVENT was modified like this:
     1. event in sync on peer A and B
     2. DESCRIPTION is extended on peer A
     3. SUMMARY is modified on peer B
     4. syncing recognizes the conflict and resolves it by using the
        SUMMARY of peer B (because the item on B is more recent) and the
        DESCRIPTION of A (because the description of B is a subset of
        it)

These two properties are handled differently because the conflict
resolution policy is configured differently to reflect the difference
between single-line and multi-line text.

>   What about missing unique IDs?

In such a case only slow syncs are possible. The Synthesis data modeling
defines which properties are compared to find pairs. The drawback of a
slow sync is that data removed on one side will be recreated.

I have thought a bit about that over Christmas, because I am now in that
situation: I can modify the address book on my FRITZ!Box 7390 router,
but it is an XML file with no unique identifier for each entry. My idea
is to do synchronization in multiple steps:
     1. keep a local mirror of all contacts
     2. do a slow sync against that mirror to find pairs; items in the
        mirror which have no corresponding entry on the router can be
        removed
     3. two-way sync between the mirror and my master data
     4. upload copy of the mirror to the router

The simpler alternative would be to pick some properties and use those
as key, perhaps with hashing to keep the key size small.

> Conflicts?

See above. Client-wins/server-wins/most-recent-wins are all
configurable. SyncEvolution itself uses most-recent-wins, with smart
merging of some properties.

>   And the 
> many other issues that OpenSync has been adding complexity while trying to 
> solve?

We would need to list those, but I'm fairly sure that much of it has
been considered already.

> In summary, I would like to understand why you feel that redirecting our 
> efforts to SyncEvolution has any greater chance of success in solving the hard 
> problems of syncing.

My own summary, more at a meta level than the details above:
      * don't reinvent the wheel, use a mature engine (Synthesis)
      * add features in small steps (more manageable, immediately
        useful)

-- 
Bye, Patrick Ohly
--  
Patrick.Ohly at gmx.de
http://www.estamos.de/