Threading problems in (Win)DBus
hp at redhat.com
Sat Jul 7 11:24:39 PDT 2007
From the traces, here two different threads take the lock at the same
10357: LOCK: _dbus_connection_acquire_dispatch 
10357: 10357: lock socket_do_iteration post poll
10357: LOCK: _dbus_connection_lock 
And then here also:
11703: LOCK: dbus_connection_send_with_reply 
11703: Allocated slot 0 on allocator 0xb7eb583c total 1 slots allocated
11703: LOCK: dbus_connection_get_is_connected 
So that should not be possible, right? Unless the implementation of
locking is messed up somehow (which it may well be) or the locks are
no-ops since threads are for some reason not enabled.
Perhaps a few unit tests in dbus-sysdeps-pthread.c to be sure the
mutexes work properly would not be a bad idea to add, they'd confirm or
unconfirm this theory and prevent similar bugs in the future.
Some background info below on how the locks are supposed to work, to aid
in debugging -
Christian Grigis wrote:
> Another problem that I noticed: for the pthread implementation of
> threads in dbus-sysdeps-pthread.c, the _dbus_pthread_mutex_lock()
> function appears to simulate a reentrant lock by using the pmutex->count
> counter. Wouldn't the CONNECTION_LOCK macro then fail in case a lock is
> acquired twice by the same thread (in TOOK_LOCK_CHECK)?
> Are the two problems related?
My memory is that while all the locks support recursion, only the
dispatch_mutex is in fact supposed to be used recursively. That's why we
left have_connection_lock as a boolean rather than a counter.
The code that uses connection->mutex is instead supposed to avoid
holding that lock when it calls back out to user code. The reasons for
- historical, the lock didn't used to be recursive
- back compat, nonrecursive locks can still be used with
the "set thread functions" api and this should only
mean that recursive dispatch is disallowed (you can't
connection_dispatch() from inside a message handler)
- sanity, recursion in a single thread can cause problems
Anyway, so if we saw the same thread triggering this assertion the
correct fix is probably to avoid recursing on connection->mutex which
could involved rearranging code a good bit.
But if I understood your verbose logs correctly, we have two different
threads both owning the lock in this case.
If it's two different threads getting the same lock, the solution
probably is not very complicated; it's just a matter of figuring out why
locks aren't enabled or why they don't work. Just stopping in a debugger
on the assertion and examining the fields in the lock struct for example
might reveal what's wrong.
More information about the dbus