Race condition in send_with_reply_and_block()
vstinner at wyplay.com
Mon Jul 22 09:15:41 PDT 2013
I'm investigating a race condition in our product (a set top box).
Sometimes, a D-Bus call fails with a timeout error after 25 seconds,
whereas the client got the answer. My application is calling
dbus_bus_add_match() and dbus_bus_remove_match() in a thread, and our
D-Bus loop is running in another thread, (and the mainthread is doing
sometimes else ;-)).
dbus_bus_add_match() and dbus_bus_remove_match() call indirectly
dbus_connection_send_with_reply_and_block(). The loop calls
dbus_watch_handle() and dbus_connection_dispatch().
It looks like the "dbus_connection_send_with_reply_and_block() race
condition with threads" is a known problem since year 2004 (issue
#857), but it is still not solved. For example, my issue is well
described in this message:
Basically, the problem is that two threads access the same resource
(the transport, the pending call and the reply message). Different
kind of locks are used (lock on the connection, on connection dispatch
status, on the pending call, ...), but they are not enough to protect
dbus against this race condition.
I found a workaround: set pending->completed=TRUE before releasing the
connection lock in dbus_connection_dispatch(). It is enough if
dbus_connection_send_with_reply_and_block() is only called from 1
thread at once, but it does not work with 2 threads calling
dbus_connection_send_with_reply_and_block() should not "hijack" the
role of the main loop (handle watch / dispatch). As suggested in the
following comment, dbus_connection_send_with_reply_and_block() should
rely on the D-Bus loop and just wait for an event (like a conditional
In my opinion, this is the best approach. Does this approach work in
an application without D-Bus loop? The comment says:
"That does not solve the problem when two threads simultaneously call
dbus_connection_send_with_reply_and_block() on a
connection without a main loop, but it can be fixed by disallowing that (...)".
I would like to fix this issue, but I'm unable to find the correct
fix. Do you have an idea?
A simpler solution is maybe to change the usage of locks to not call
dbus_connection_send_with_reply_and_block() and dbus_watch_handle() /
dbus_connection_dispatch() at the same time: add a new "big lock" or
rely on an existing lock like the dispatch lock. It is maybe a simple
and safe solution, but it would be less efficient :-(
More information about the dbus