Finishing the network protocol

Tue Feb 22 10:16:17 PST 2011

Hello,

I've started a Wayland implementation currently called "area", written in
C++ and with the main goal to work on all hardware that currently works
on Linux in some way (Framebuffer or X11).
I've started with the network code and noticed a few things that still
look like prototype code in Wayland, probably unchanged from early
versions.
Well, Wayland is becoming a serious project, so I think we should start
finishing and fixing the protocol. Additionally I don't know if I will
finish my own project so I'm looking to contribute my findings to the
main project.

First some hopefully correct primer of the Wayland protocol for
interested onlookers.
- The Wayland protocol is a remote procedure call protocol of sorts. All
  messages are exchanged between objects; the protocol is asynchronous
  and no methods have return values as such.
  Methods can "return values" by triggering a message back.
  Everything is asynchronous, but order of messages is preserved.
- Each protocol-level object exists on both client and server
- It's easiest to think of all objects being created by the server on
  behalf of the clients.
- Object constructors don't return anything, they have an object ID
  argument that is pre-chosen and can later be used to refer to the 
  created object. If creation succeeded there is no message back.
- The conversation between client and server is bootstrapped by creating
  the "display" object on each client, which then starts talking to the
  global display object on the server.

I see the following issues:
- There is no way to subscribe to events, or rather there is no way not
  to subscribe to all events.
- Range-based ID issuance for object IDs (obviously can't use pointers
  between processes) is not bulletproof. It is possible for ranges to 
  become fragmented insofar that they can't be reclaimed because there's
  one ID in every range. There is also currently no code that tries to
  reclaim ranges.
  The practical implication is that a Wayland server can, by design, not
  run indefinitely without exhausting ID (range)s.
  Another kind-of-problem is that a client can interfere with another
  client's operation by, intentionally or not, using IDs belonging to
  the other client.

I've looked at the TODO and come up with a few ideas of my own for the
following suggestions to modify the protocol:

- Have one ID<->object map per client, except for global objects where
  there is a global map in the server.
  This is suggested in the TODO file; I've done it this way right away.
  Obviously each client will have its own map in the client process
  anyway.
- Have three ID ranges:
   a) for global objects
   b) for client-specific objects created by the server (do they exist?)
   c) for client-specific objects created by the client
  where "created by" really means "creation initiated (assigning an
  object ID) by".
- Handle subscription without an extra mechanism by creating or not
  creating the object that will receive the desired events. Might need
  some splitting of existing objects.
  This would IMHO be an elegant and minimal way to handle the matter.
- A scheme to recycle object IDs. When a new ID is needed, pick a free
  one at random. This introduces a problem:
  Suppose the client destroys object A with ID n, then by chance
  immediately reuses ID n for object B. 
  The server will only receive this information later, the Wayland
  protocol being asynchronous and the server not having to respond to an 
  object creation request, unless it goes wrong. In the meantime the
  server could send an event intended for A which would end up at B,
  causing Bad Things to happen - in my implementation most likely an
  assert failure unless the objects are of the same class.
  (This is the trickiest failure mode I could think of)
  The suggested solution is a kind of "rendezvous" for objects where this
  can happen, or for simplicity all objects:
  On both client and server, have a function that needs to be called
  twice to unregister an object ID.
  One call from the destructor of the local object when it destroys
  itself, one call from the remote counterpart object when it destroys
  itself. No matter in which order the method is called, the first call
  removes the ID<->object mapping and puts the object ID on a waiting
  list to avoid reuse. The second call removes the ID from the waiting
  list, making it free to reuse.
- Specify how parent-child relationships work, e.g. (bad example, the
  answer is probably no here) is a surface automatically destroyed when
  its screen goes away? By whom?
- Specify who gets do delete objects and how that looks in the protocol
  - this is probably more a matter of documentation; I didn't read all
  of Wayland's code carefully and implementers ideally shouldn't have to.
- Add information in the protocol description XML file about things like
  an object being global or not, and basically everything mentioned above
  that can benefit from help from the code generator.

I'm not publishing a repository URL right now because I haven't chosen a
license yet and because I've copied over the wayland.xml protocol
description file that bears no license header. Kristian, what about the
license of that file?
If there is interest I can polish my code a bit and publish an URL.

Cheers,
Andreas