Threading problems in (Win)DBus

Sat Jul 7 11:24:39 PDT 2007

Hi,

 From the traces, here two different threads take the lock at the same 
time (apparently?):

10357:   LOCK: _dbus_connection_acquire_dispatch [3082640272]
10357: 10357: lock socket_do_iteration post poll
10357:   LOCK: _dbus_connection_lock [3082643152]

And then here also:

11703:   LOCK: dbus_connection_send_with_reply [3082446544]
11703: Allocated slot 0 on allocator 0xb7eb583c total 1 slots allocated
1 used
11703:   LOCK: dbus_connection_get_is_connected [3082443664]

So that should not be possible, right? Unless the implementation of 
locking is messed up somehow (which it may well be) or the locks are 
no-ops since threads are for some reason not enabled.

Perhaps a few unit tests in dbus-sysdeps-pthread.c to be sure the 
mutexes work properly would not be a bad idea to add, they'd confirm or 
unconfirm this theory and prevent similar bugs in the future.

Some background info below on how the locks are supposed to work, to aid 
in debugging -

Christian Grigis wrote:
> Another problem that I noticed: for the pthread implementation of
> threads in dbus-sysdeps-pthread.c, the _dbus_pthread_mutex_lock()
> function appears to simulate a reentrant lock by using the pmutex->count
> counter. Wouldn't the CONNECTION_LOCK macro then fail in case a lock is
> acquired twice by the same thread (in TOOK_LOCK_CHECK)?
> 
> Are the two problems related?
> 

My memory is that while all the locks support recursion, only the 
dispatch_mutex is in fact supposed to be used recursively. That's why we 
left have_connection_lock as a boolean rather than a counter.

The code that uses connection->mutex is instead supposed to avoid 
holding that lock when it calls back out to user code. The reasons for 
this are:
  - historical, the lock didn't used to be recursive
  - back compat, nonrecursive locks can still be used with
    the "set thread functions" api and this should only
    mean that recursive dispatch is disallowed (you can't
    connection_dispatch() from inside a message handler)
  - sanity, recursion in a single thread can cause problems
    as well

Anyway, so if we saw the same thread triggering this assertion the 
correct fix is probably to avoid recursing on connection->mutex which 
could involved rearranging code a good bit.

But if I understood your verbose logs correctly, we have two different 
threads both owning the lock in this case.

If it's two different threads getting the same lock, the solution 
probably is not very complicated; it's just a matter of figuring out why 
locks aren't enabled or why they don't work. Just stopping in a debugger 
on the assertion and examining the fields in the lock struct for example 
might reveal what's wrong.

Havoc