Threading problems in (Win)DBus

Sun Jul 8 14:10:18 PDT 2007

Havoc Pennington wrote:
> Hi,
> 
>  From the traces, here two different threads take the lock at the same 
> time (apparently?):
> 
> 10357:   LOCK: _dbus_connection_acquire_dispatch [3082640272]
> 10357: 10357: lock socket_do_iteration post poll
> 10357:   LOCK: _dbus_connection_lock [3082643152]
> 
> And then here also:
> 
> 11703:   LOCK: dbus_connection_send_with_reply [3082446544]
> 11703: Allocated slot 0 on allocator 0xb7eb583c total 1 slots allocated
> 1 used
> 11703:   LOCK: dbus_connection_get_is_connected [3082443664]
> 
> So that should not be possible, right? Unless the implementation of 
> locking is messed up somehow (which it may well be) or the locks are 
> no-ops since threads are for some reason not enabled.
> 
> Perhaps a few unit tests in dbus-sysdeps-pthread.c to be sure the 
> mutexes work properly would not be a bad idea to add, they'd confirm or 
> unconfirm this theory and prevent similar bugs in the future.
> 
> Some background info below on how the locks are supposed to work, to aid 
> in debugging -
> 
> Christian Grigis wrote:
>> Another problem that I noticed: for the pthread implementation of
>> threads in dbus-sysdeps-pthread.c, the _dbus_pthread_mutex_lock()
>> function appears to simulate a reentrant lock by using the pmutex->count
>> counter. Wouldn't the CONNECTION_LOCK macro then fail in case a lock is
>> acquired twice by the same thread (in TOOK_LOCK_CHECK)?
>>
>> Are the two problems related?
>>
> 
> My memory is that while all the locks support recursion, only the 
> dispatch_mutex is in fact supposed to be used recursively. That's why we 
> left have_connection_lock as a boolean rather than a counter.
> 
> The code that uses connection->mutex is instead supposed to avoid 
> holding that lock when it calls back out to user code. The reasons for 
> this are:
>   - historical, the lock didn't used to be recursive
>   - back compat, nonrecursive locks can still be used with
>     the "set thread functions" api and this should only
>     mean that recursive dispatch is disallowed (you can't
>     connection_dispatch() from inside a message handler)
>   - sanity, recursion in a single thread can cause problems
>     as well
> 
> Anyway, so if we saw the same thread triggering this assertion the 
> correct fix is probably to avoid recursing on connection->mutex which 
> could involved rearranging code a good bit.
> 
> But if I understood your verbose logs correctly, we have two different 
> threads both owning the lock in this case.
> 
> If it's two different threads getting the same lock, the solution 
> probably is not very complicated; it's just a matter of figuring out why 
> locks aren't enabled or why they don't work. Just stopping in a debugger 
> on the assertion and examining the fields in the lock struct for example 
> might reveal what's wrong.
> 
> Havoc

Seems no mutexes are used which is possible:

void
_dbus_mutex_lock (DBusMutex *mutex)
{
  if (mutex)
    {
      if (thread_functions.recursive_mutex_lock)
        (* thread_functions.recursive_mutex_lock) (mutex);
      else if (thread_functions.mutex_lock)
        (* thread_functions.mutex_lock) (mutex);
    }
}

Does the test code also work with attached patch which checks
if the mutex is NULL.

After applying the patch also echo-server needs to initialize
the threading, DBus::_init_threading();

-- 
Peter Kümmel
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: connection_mutex.patch
Url: http://lists.freedesktop.org/archives/dbus/attachments/20070708/77615b2e/attachment.txt