implementing IN_PROGRESS

Sat Feb 25 12:00:33 PST 2006

Hi,

Looked at this a bit.

Today
===

Refresher on how it works now (not that it necessarily works in the best
way).

The DBusConnection has a "big lock" that protects most fields in the
connection. However, there's also a "dispatch lock" which can be held
orthogonally; it protects DBusConnection->dispatch_acquired. This flag
can only be set or read when holding the dispatch lock. [1]

The dispatch lock is separate because we need to hold it while calling
out to application code, and we don't want all connection methods to
deadlock if the app calls them while dispatching.

If you call dbus_connection_borrow_message() that will acquire the
dispatch lock and leave it held until you return the message. If you
pop_message(), it has to take the dispatch lock while it does the
popping, but doesn't keep it held. The dispatch lock is also held while
calling the handlers/filters for a message (dispatching), of course.

In other words messages can't come off the message queue unless you have
the dispatch lock; and if you haven't removed the permanently yet, you
have to hold on to the dispatch lock until you do.

The problem
===

The problem arises when you want to dispatch recursively _in the same
thread_ that's currently dispatching. If you dispatch recursively from a
different thread, the code simply blocks on the cond variable for the
dispatch lock until the current dispatch ends.

Right now attempting to dispatch while "dispatch in progress" from the
same thread will deadlock.

The proposal is that "dispatch in progress" return a status indication
instead of deadlocking, and that callers then handle that indication.

Cases
===

Right now code calling _dbus_connection_acquire_dispatch() assumes it
will always succeed. With the IN_PROGRESS change, it will be able to
fail with an IN_PROGRESS indication if the *current thread* already has
the dispatch lock.

We need to think about everywhere that acquires the dispatch lock, and
what that code will do if a dispatch is in progress in the same thread.
grep for "_dbus_connection_acquire_dispatch()" in dbus-connection.c.

Functions this affects directly are:
 - _dbus_connection_borrow_message()
   if it can't get the dispatch what can it do? busy loop? return NULL?
 - _dbus_connection_pop_message()
   again, busy loop? return NULL?
 - dbus_connection_dispatch()
   - proposing a return of IN_PROGRESS, punting the problem to the 
     caller

We also have at least one indirect caller of one of the above three:
 - _dbus_connection_read_write_dispatch() calls dispatch(); 
   what does it do on IN_PROGRESS?
   I think without changing API, it either busy loops itself, 
   or becomes a no-op such that the caller ends up busy looping

And one thing I think is a bug, 
 - _dbus_connection_block_pending_call()
   currently pops a message without holding the dispatch lock,
   I think this is just not OK and should be fixed. But if 
   we fix it, what happens on IN_PROGRESS? either busy loop, 
   or fail to complete the pending call, essentially...

dbus_connection_get_dispatch_status
===

Right now you can ask for the dispatch status without actually
dispatching, which is done all over the place in dbus-connection.c.
It looks like almost all of these cases are getting the status only 
in order to call the "dispatch status function" set by
dbus_connection_set_dispatch_status_function().

The cases that care about the status themselves are the cases listed
above, that eventually acquire the dispatch anyhow.

Two ways to go with get_dispatch_status():

 - We could conceivably say that get_dispatch_status() never returns 
   IN_PROGRESS, and that only an actual dispatch() will detect 
   recursion.

 - Or we could have get_dispatch_status() return IN_PROGRESS if the 
   dispatch lock is held by the current thread.

For the cases in the bus daemon and the glib bindings, it looks like
this will make no difference whatsoever. (On the other hand, they don't
recurse in the same thread anyway.)

There may be some trickiness with updating to IN_PROGRESS for the
set_dispatch_status_function() function, since that will mean calling
out to application code at a sort of inconvenient time perhaps. I don't
know.

What does application/binding code do with IN_PROGRESS?
===

In figuring out how the various APIs handle or return the IN_PROGRESS
indication, we probably want to understand what apps will do when they
get this indication.

If they just try again (proceed as if they got DATA_REMAINS), they will
busy loop. If they never try again (proceed as if they got COMPLETED)
then they will hang and not get messages that arrived. They need to "try
again later" where later means "when the currently-dispatching handler
is popped off the call stack"

We have a prior example in the GLib main loop, which supports a "no
recurse" flag - AIUI what this means is that the event source should be
ignored for main loop iterations run while the event source is currently
dispatching. When the dispatch completes, the "outermost" main loop
iterator will see any further events on the event source.

In dbus terms, we could say that get_dispatch_status() should return
COMPLETED (= no dispatch needed/wanted) if a recursed main loop
iteration calls it, and should return DATA_REMAINS (= please dispatch
me) if the outermost main loop iteration calls it.

IN_PROGRESS or just COMPLETED?
===

Perhaps this is the correct solution to the problem? When recursed in
the same thread, we just return COMPLETED until we get un-recursed, and
then return DATA_REMAINS again? That might make things transparent to
apps, but I don't know if it would work out properly in practice.

If we guess that the outermost main loop iteration can't call
get_dispatch_status() or re-call dispatch(), because it's currently
inside the dispatch handler; then we know we don't need to return
DATA_REMAINS. If we return COMPLETED while dispatching, it should always
be going back to a recursed main loop, not the original one. Unless the
programmer does something like store the dispatch status while recursed
and use it later.

If this approach works we still need to fix the code to detect a
recursion and return COMPLETED instead of deadlocking, but we would not
need to fix up any application code.

Implementation
===

Once we figure out the API changes, how do we detect "current thread
already has the lock" in order to special case it? People who know more
than I do about threads are encouraged to reply.

We could add a "try_lock" function to DBusThreadFunctions, but this (I
believe) would not allow us to distinguish whether our thread has
already locked, vs. another thread has already locked, right?

We could add a "get current thread" function to DBusThreadFunctions, and
when we acquire the dispatch store the thread that acquired it rather
than just the dispatch_acquired flag. The
DBusConnection->dispatch_thread would have to be protected by the
connection lock, not the dispatch lock, so people could compare
dispatch_thread to get_current_thread() before trying to get the
dispatch lock. This means holding the connection lock in
acquire_dispatch(), which we don't currently do for some reason I don't
understand - but assuming there's no reason for that, I think this
approach works.

Other suggestions?

Havoc

[1] currently we set dispatch_acquired while not holding the connection
lock, and unset it while holding the connection lock, which is kind of
strange and possibly a bug