Threading problems in (Win)DBus
olivier.hochreutiner at gmail.com
Mon Jul 9 07:08:27 PDT 2007
2007/7/7, Havoc Pennington <hp at redhat.com>:
> From the traces, here two different threads take the lock at the same
> time (apparently?):
> 10357: LOCK: _dbus_connection_acquire_dispatch 
> 10357: 10357: lock socket_do_iteration post poll
> 10357: LOCK: _dbus_connection_lock 
> And then here also:
> 11703: LOCK: dbus_connection_send_with_reply 
> 11703: Allocated slot 0 on allocator 0xb7eb583c total 1 slots allocated
> 1 used
> 11703: LOCK: dbus_connection_get_is_connected 
> So that should not be possible, right? Unless the implementation of
> locking is messed up somehow (which it may well be) or the locks are
> no-ops since threads are for some reason not enabled.
As of Christian's second post, threading was disabled in the traces
above. Now that he enabled it, he has the exact same behaviour under
Linux I have under Win32: when traces are enabled, no assert fails,
and when traces are disabled, an assert fails after 10-100 iterations
of the while loop in Paolo's example. The failing assert occurs in
several different places, but it is always
"connection->have_connection_lock" or "connection->io_path_acquired".
I tried Peter's patch to test if threading is really enabled, and it
is (in case you still doubted ;-)
Also note that it seems the bug can be reproduced much easier (if not
only ?) on fast CPUs. But that does not help much...
All the facts we have till now makes me think that memory corruption /
buffer overflow are involved in this problem, for the following
1) connection->have_connection_lock is set to TRUE when no one has the
lock (asserts I reported tell us that)
2) The only place where the connection mutex can be acquired and
released in the code is in the CONNECTION_LOCK / UNLOCK macros
3) connection->have_connection_lock is modified only in these two macros too
4) In C, it is not possible to have a pointer on a "unsigned int x:1"
(which BTW is remarkable, since you can usually point to almost
anything in this language :), thus it is not possible that the
have_connection_lock field of the DBusConnection structure is modified
through a valid pointer (but an overflowed one can)
5) I assume system-provided mutexes (pthread or win32) are not broken,
or at least not on *both* OS.
Do you think my assumptions are corrects ?
More information about the dbus