dbus & threading (fix) ...

Michael Meeks michael.meeks at novell.com
Fri Mar 5 12:43:57 PST 2010


Hi guys,

	For some time there have been some threading nasties with dbus
that have hurt people implementing with it - AFAIR this was an issue
for gio, and it is certainly a big issue for Evolution (now using dbus
for it's calendaring and addressbook code). I have a prototype patch
to improve things.

	The most significant issue I have seen thus far is essentially
a latency problem introduced by the handling of connections & polling.
I append a full stack-trace of the issue, but in simple terms it is
this:

Thread <N>
#2  0xb75a8ffb in _dbus_poll (fds=0xaa68ef60, n_fds=1, timeout_milliseconds=9531) at dbus-sysdeps-unix.c:1938
#3  0xb75a11b7 in socket_do_iteration (transport=0xb5b0310, flags=6, timeout_milliseconds=9531) at dbus-transport-socket.c:1046
#4  0xb759f41f in _dbus_transport_do_iteration (transport=0xb5b0310, flags=<value optimized out>, timeout_milliseconds=9531) at dbus-transport.c:956
#5  0xb758a1be in _dbus_connection_do_iteration_unlocked (connection=0xaf5ec60, flags=6, timeout_milliseconds=9531) at dbus-connection.c:1150
#6  0xb758c7a3 in _dbus_connection_block_pending_call (pending=0xe5c5b40) at dbus-connection.c:2381
#7  0xb75997a1 in dbus_pending_call_block (pending=0xfffffdfc) at dbus-pending-call.c:707

Thread 1 - mainloop thread.
#2  0xb75a8176 in _dbus_pthread_condvar_wait (cond=0x8518dd8, mutex=0xaf5ed20) at dbus-sysdeps-pthread.c:231
#3  0xb759eb33 in _dbus_condvar_wait (cond=0x80, mutex=0x8ff1) at dbus-threads.c:254
#4  0xb7589d12 in _dbus_connection_acquire_io_path (connection=0xaf5ec60, timeout_milliseconds=-1) at dbus-connection.c:1051
#5  0xb758c92b in _dbus_connection_handle_watch (watch=0xab8d3f8, condition=1, data=0xaf5ec60) at dbus-connection.c:1425
#6  0xb75a1d24 in dbus_watch_handle (watch=0xab8d3f8, flags=1) at dbus-watch.c:663
#7  0xb77012c6 in io_handler_dispatch (source=0xab8d5d8, condition=0, data=0x8dc9c88) at edbusprivate.c:238
#8  0xb690616d in g_io_unix_dispatch (source=0x9e9e510, callback=0xb7701260 <io_handler_dispatch>, user_data=0x8dc9c88) at giounix.c:162

	The problem is - that for many synchronous calls done in
another thread, if there is a long running synchronous call
(eg. timing out over many seconds (25?)) happening in thread <N> -
while (perhaps) another async callback result comes back, the main
thread will wake on the (same) file descriptor and connection, and
block until it can get the io_path.

	In simpler terms, this means that all the work done to push a
long-running series of sync call out to another thread is pointless
they (often, but intermittently) block the main thread :-)

	I propose a simple patch to fix this:

diff --git a/dbus/dbus-connection.c b/dbus/dbus-connection.c
index 55e4cc2..e6130f5 100644
--- a/dbus/dbus-connection.c
+++ b/dbus/dbus-connection.c
@@ -1441,7 +1441,13 @@ _dbus_connection_handle_watch (DBusWatch                   *watch,
   _dbus_verbose ("%s start\n", _DBUS_FUNCTION_NAME);
   
   CONNECTION_LOCK (connection);
-  _dbus_connection_acquire_io_path (connection, -1);
+  if (!_dbus_connection_acquire_io_path (connection, 1))
+    {
+      /* another thread is handling the message */
+      CONNECTION_UNLOCK (connection);
+      return TRUE;
+    }
+
   HAVE_LOCK_CHECK (connection);
   retval = _dbus_transport_handle_watch (connection->transport,
                                          watch, condition);


	Essentially - if we cannot get the io_path - then we know that
another thread is either polling on this connection, or will poll on
it, and that if it is not going to - then, next time around the
main-loop our watch will be triggered again and we will have another
go.

	For me, this removes lots of UI freezes running Evolution
master, perhaps it is more widely useful.

	The question as to:

	"why choose to waste 1ms rather than use another condition ?"

	is a good one ;-) indeed, this is unfortunate. This works well
on linux. Without it (and with the appened patch) - unfortunately, we
sit in the g_lib mainloop polling

1267792373.00475511 I/O collision on socket 29 in (0x1) in thread 0xb5d14740, spin instead of blocking
.. omit 60 duplicates ...
1267792373.00477146 I/O collision on socket 29 in (0x1) in thread 0xb5d14740, spin instead of blocking
1267792373.00477212 I/O read from socket 29, in 0x1 out 0x1 in thread 0xb4dbcb70

	And hoping the scheduler will switch to the other thread with
a poll that wants to wake up - something it doesn't plan to do until
we burned an entire time-slice. Similarly doing a condition wait with
a zero timeout wastes a lot of cycles, there is no abstraction for a
sched_yield, and a 1ms condition sleep does what we need; since it is
(hopefully) a fairly rare condition, in general we don't loose a lot
of time (certainly less than we did before).

	Of course - something much more complicated, with the same
effect could be implemented, but IMHO the exisiting I/O scheme for
threading in dbus is somewhat over-complicated. IMHO the ORBit2
approach of splitting out an I/O thead to do centralised polling
(modeled itself on the UNO approach), that is activated when we first
do a call from a non-main thread is rather more elegant, simple, and
not noticably less efficient.

	David - do we have plans for gdbus to use an I/O thread ? and
preferably do the I/O ourselves, without linking libdbus - might be a
lot simpler and cleaner.

	Anyhow - I would love some more review; I've been running this
with evolution, and my whole system bus, session bus etc. for some
hours now without any noticable problems, of course 'make check' still 
passes nicely etc.

	Thoughts & review much appreciated,

	Thanks,

		Michael.

Thread 840 (Thread 0xaa68fb70 (LWP 7280)):
#0  0xffffe424 in __kernel_vsyscall ()
#1  0xb67fb777 in poll () from /lib/libc.so.6
#2  0xb75a8ffb in _dbus_poll (fds=0xaa68ef60, n_fds=1, timeout_milliseconds=9531) at dbus-sysdeps-unix.c:1938
#3  0xb75a11b7 in socket_do_iteration (transport=0xb5b0310, flags=6, timeout_milliseconds=9531) at dbus-transport-socket.c:1046
#4  0xb759f41f in _dbus_transport_do_iteration (transport=0xb5b0310, flags=<value optimized out>, timeout_milliseconds=9531) at dbus-transport.c:956
#5  0xb758a1be in _dbus_connection_do_iteration_unlocked (connection=0xaf5ec60, flags=6, timeout_milliseconds=9531) at dbus-connection.c:1150
#6  0xb758c7a3 in _dbus_connection_block_pending_call (pending=0xe5c5b40) at dbus-connection.c:2381
#7  0xb75997a1 in dbus_pending_call_block (pending=0xfffffdfc) at dbus-pending-call.c:707
#8  0xb76f7368 in e_dbus_connection_send_dbus_1_message_with_reply_sync (error=<value optimized out>, cancellable=<value optimized out>, 
    timeout_msec=<value optimized out>, message=<value optimized out>, connection=<value optimized out>) at edbusconnection.c:1598
#9  e_dbus_connection_invoke_method_sync (error=<value optimized out>, cancellable=<value optimized out>, timeout_msec=<value optimized out>, 
    message=<value optimized out>, connection=<value optimized out>) at edbusconnection.c:3593
#10 0xb76ff2ce in e_dbus_proxy_invoke_method_sync (proxy=0xb7508e0 [EDBusProxy], method_name=0xb7be20f7 "getQuery", parameters=0xbc91f80, timeout_msec=-1, 
    cancellable=0x0, error=0xaa68f268) at edbusproxy.c:1296
#11 0xb7bc2dfb in e_data_cal_gdbus_get_query_sync (error=<value optimized out>, OUT_path=<value optimized out>, IN_sexp=<value optimized out>, 
    proxy=<value optimized out>) at e-data-cal-gdbus-bindings.h:412
#12 e_cal_get_query (error=<value optimized out>, OUT_path=<value optimized out>, IN_sexp=<value optimized out>, proxy=<value optimized out>) at e-cal.c:3945
#13 0xac28a10d in update_query_async (msg=0xdb342b0) at gnome-cal.c:1042
#14 0xac289110 in message_proxy (msg=0x253b) at gnome-cal.c:178
#15 0xb68fb036 in g_thread_pool_thread_proxy (data=0xb586360) at gthreadpool.c:265
#16 0xb68f99ef in g_thread_create_proxy (data=0xe84aaa8) at gthread.c:635
#17 0xb764c725 in start_thread () from /lib/libpthread.so.0
#18 0xb680626e in clone () from /lib/libc.so.6
..
Thread 1 (Thread 0xb6517740 (LWP 24769)):
#0  0xffffe424 in __kernel_vsyscall ()
#1  0xb76503d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xb75a8176 in _dbus_pthread_condvar_wait (cond=0x8518dd8, mutex=0xaf5ed20) at dbus-sysdeps-pthread.c:231
#3  0xb759eb33 in _dbus_condvar_wait (cond=0x80, mutex=0x8ff1) at dbus-threads.c:254
#4  0xb7589d12 in _dbus_connection_acquire_io_path (connection=0xaf5ec60, timeout_milliseconds=-1) at dbus-connection.c:1051
#5  0xb758c92b in _dbus_connection_handle_watch (watch=0xab8d3f8, condition=1, data=0xaf5ec60) at dbus-connection.c:1425
#6  0xb75a1d24 in dbus_watch_handle (watch=0xab8d3f8, flags=1) at dbus-watch.c:663
#7  0xb77012c6 in io_handler_dispatch (source=0xab8d5d8, condition=0, data=0x8dc9c88) at edbusprivate.c:238
#8  0xb690616d in g_io_unix_dispatch (source=0x9e9e510, callback=0xb7701260 <io_handler_dispatch>, user_data=0x8dc9c88) at giounix.c:162
#9  0xb68cf448 in g_main_dispatch (context=<value optimized out>) at gmain.c:1960
#10 IA__g_main_context_dispatch (context=<value optimized out>) at gmain.c:2513
#11 0xb68d2bf3 in g_main_context_iterate (context=0x8075708, block=1, dispatch=1, self=0x807c600) at gmain.c:2591
#12 0xb68d30ba in IA__g_main_loop_run (loop=0x8943368) at gmain.c:2799
#13 0xb7059c69 in IA__gtk_main () at gtkmain.c:1218
#14 0x0804ab37 in main (argc=134990352, argv=0x1) at main.c:611


-- 
 michael.meeks at novell.com  <><, Pseudo Engineer, itinerant idiot

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dbus-debug-instrumentation.diff
Type: text/x-patch
Size: 3436 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dbus/attachments/20100305/648ca81e/attachment.bin>


More information about the dbus mailing list