deadlock protection

Mon Oct 18 13:28:03 PDT 2004

Hi,

Just doing some initial half-baked thinking on deadlock protection.
For reference:
http://lists.kde.org/?t=109701199100001&r=1&w=2

Let me naively map this to D-BUS, and then describe some possible
problems with the naive approach. I could use some help figuring out
details of how to do this.

The D-BUS changes could be:

 1 for each outgoing method call, have a call stack ID that will 
   identify the call stack that method call initiates or is part of

 2 have dbus_message_set_call_stack (message, id) for setting the ID

 3 some sort of API to encourage or automate setting it; this could 
   be that any outgoing messages queued while the current incoming 
   message is being processed would have the ID of the incoming
   message, and outgoing messages with no incoming call on the stack
   would have a new ID.

 4 this automated mechanism would not work if people do async stuff
   based on either main loop or threads, but they 
   could still manually propagate the ID

 5 bindings could force setting the ID by having a single-threaded
   never-any-async API

 6 if an incoming method call has the call stack ID of a reply-pending 
   outgoing call, jump the incoming call to the front of the message
   queue and dispatch the main loop until the incoming call is popped 
   off the queue

Issues:

 - reordering of messages. In the above proposal, I suggested for 
   consistency that the reordering happens anytime we have a pending 
   reply, not only when we are  *blocking* for said reply. I'm having 
   trouble thinking of a good example of when reordering will cause 
   problems, but I'm not comfortable I've fully thought through 
   when it will happen and what the consequences will be.

 - as mentioned in Waldo's post, we are not fixing all deadlocks; 
   only a certain common type of deadlock. Anytime an app blocks,
   a deadlock is possible. Really the only way to avoid this is 
   to write everything async.

 - because dbus allows multithreaded and main-loop-based async
   calls, we can't fully/reliably automate tracking the call stack ID;
   some app developers may screw it up

 - in the above, item 6 has "dispatch the main loop" - but libdbus
   currently has no way to do that, the main loop is only a concept
   higher up in the bindings

 - in item 3, "while the current incoming message is being processed"
   isn't a concept we have right now ... messages are just popped
   off the queue and never put back, there's no "end" of the 
   processing, other than perhaps the message getting unref'd.
   In libdbus that is, the bindings may have a "current message"
   concept.

A couple thoughts on alternate approaches, not sure these are going to
be useful, but noting them:

 - we could punt most of this to the bindings; i.e. introduce 
   call stack ID to the protocol and libdbus, but require bindings
   to figure out how to conveniently track and propagate it.
   Disadvantage of course is that the app you're talking to may 
   lose track of the call stack if its binding doesn't support it.

 - rather than jumping the would-deadlock incoming method call to the 
   front of the queue, we could return an error "EWOULDDEADLOCK" sort
   of thing. the advantage is not having to worry about semantics 
   of message reordering, or how to invoke the main loop. 
   Deadlocks would still need debugging (same as if they in fact 
   deadlocked), but they would not lock up the apps which 
   would be nice for users.

   This potentially solves more deadlock cases, however, 
   in that we could have an app mark "will block for reply" 
   on outgoing calls, and then the bus can know when a 
   client is blocking and which app it is blocking on.
   So in the "apps send a call to each other simultaneously"
   case we could detect the deadlock and return an error.
   Maybe worth doing anyway.

Anyhow, lots of details here.

Havoc