live lock in dbus condition variables on Windows (CE)

Marcus Brinkmann marcus.brinkmann at ruhr-uni-bochum.de
Wed Dec 22 08:05:12 PST 2010


Hi,

this could likely affect Windows, too, although we found it on Windows CE.

In the strigi search daemon (used in KDE), we found a situation where DBus
library live-locks.  The daemon sends a dbus message to itself (its
multithreaded), and although I have not figured out the complete call stack,
what eventually happens is this:

One thread tries to flush the connection with dbus_connection_flush, but it
can never acquire the io_path.  It is frequently woken up from waiting on the
io_path condition variable, but io_path_acquired is never false, so it always
goes back to sleep (it's stuck in the while loop in
_dbus_connection_acquire_io_path).

The other thread that holds the io_path frequently calls dbus_connection_flush
followed by _dbus_connection_do_iteration_unlocked (I have not tried to figure
out the called of _dbus_connection_do_iteration_unlocked here, that's the gap
in my knowledge about this).  It acquires the io_path for
dbus_connection_flush, then releases it quickly, which wakes up a waiter on
the condition variable (there is one: it's thread 1).  But that waiter is not
yet woken up by the operating system, because this thread still has cpu time
left on its time slice.  It keeps running until it acquires the io_path again
in do_iteration_unlocked.  While this thread holds the io_path, it calls
_dbus_poll, which calls WaitForMultipleEvents, which finally yields the cpu to
other threads.  It is then when thread 1 wakes up and tries to acquire the
io_path, only to find it snatched away by thread 2.

This cycle apparently can repeat indefinitely, I suspect that thread 1 is the
writer which is trying to flush before close, while thread 2 is the reader
which is starving the writer and thus can't continue either.  Live lock.

The reason behind this live lock is the unfairness of the condition variable
implementation under Windows (CE).  One possible fix would be relinquishing
the io_path during the WFME, but reacquiring it probably  requires some magic
that I don't know about, so I haven't tried implementing it.  It also would
require to never block when the io_path is acquired, clearly not the intention
behind this design.

A different approach is to try to make the condition variable more fair.
SetEvent is not fair in that it does not yield, but by yielding with Sleep(0)
we push the waiter into the mutex lock instruction, which is fair on Windows.
 See attached patch.

Thanks,
Marcus


More information about the dbus mailing list