[Fwd: [Dbus-cxx-devel] Some 'sent' messages don't get sent]

Rick L. Vinyard, Jr. rvinyard at cs.nmsu.edu
Tue Sep 1 09:20:26 PDT 2009


---------------------------- Original Message ----------------------------
Subject: [Dbus-cxx-devel] Some 'sent' messages don't get sent
From:    "Patrick Allison" <patrick at phys.hawaii.edu>
Date:    Fri, August 28, 2009 2:51 am
To:      dbus-cxx-devel at lists.sourceforge.net
--------------------------------------------------------------------------

Hi all:

I'm having a bit of a problem with messages occasionally getting 'stuck'
and being unsent somehow. I've turned on libdbus debugging and dbus-cxx
debugging, and I think I see why it's happening, but I can't see where the
bug is.

One bug I *did* find that I really, really thought would fix it, but
didn't, was that Connection::on_remove_watch_callback was signalling
"signal_add_watch" rather than "signal_remove_watch", and similarly
Connection::on_watch_toggled_callback was signaling "signal_add_watch"
instead of "signal_watch_toggled" (connection.cpp, lines 845 and 851)

I see this problem when one process calls a method in another process, and
the reply message gets "lost" - it never shows up in a D-Bus monitor, and
I'm pretty sure it's never getting sent:

A "correct" reply looks like this. I put a debug statement in
Connection::send to see when 'send' is called, so that's what starts this
debugging output.

3081477008: Connection::send
16976:   LOCK: dbus_connection_send
16976: Message 0x88d7df8 (2 no path no interface no member 'i') for :1.36
added to outgoing queue 0x88d79d0, 1 pending to send
16976: Message 0x88d7df8 serial is 5
16976: _dbus_connection_do_iteration_unlocked start
16976:   UNLOCK: _dbus_connection_acquire_io_path
16976: _dbus_connection_acquire_io_path locking io_path_mutex
16976: _dbus_connection_acquire_io_path start connection->io_path_acquired
= 0 timeout = 0
16976: _dbus_connection_acquire_io_path end connection->io_path_acquired =
1 we_acquired = 1
16976: _dbus_connection_acquire_io_path unlocking io_path_mutex
16976:   LOCK: _dbus_connection_acquire_io_path
16976: Transport iteration flags 0x1 timeout -1 connected = 1
16976:  iteration flags = write timeout = -1 read_watch = 0x88d7680
write_watch = 0x88d7658 fd = 3
16976: do_writing(), have_messages = 1, fd = 3
16976:  wrote 52 bytes of 52
16976: Message 0x88d7df8 (2 no path no interface no member 'i') removed
from outgoing queue 0x88d79d0, 0 left to send
16976: check_write_watch(): needed = 0 on connection 0x88d79d0 watch
0x88d7658 fd = 3 outgoing messages exist 0
16976:   UNLOCK: protected_change_watch
16976:   LOCK: protected_change_watch
16976:  ... leaving do_iteration()
16976: _dbus_transport_do_iteration end
16976: _dbus_connection_release_io_path locking io_path_mutex
16976: _dbus_connection_release_io_path start connection->io_path_acquired
= 1
16976: _dbus_connection_release_io_path unlocking io_path_mutex
16976: _dbus_connection_do_iteration_unlocked end
16976: _dbus_connection_send_preallocated_and_unlock middle
16976: dispatch status = complete is_connected = 1
16976:   UNLOCK: _dbus_connection_update_dispatch_status_and_unlock
3081477008: Object::handle_message: message was handled

And here's one from later in that session, when the message was "lost" and
the reply timed out:

3081477008: Connection::send
16976:   LOCK: dbus_connection_send
16976: Message 0x88d9bf0 (2 no path no interface no member 's') for :1.37
added to outgoing queue 0x88d79d0, 1 pending to send
16976: Message 0x88d9bf0 serial is 8
16976: _dbus_connection_do_iteration_unlocked start
16976:   UNLOCK: _dbus_connection_acquire_io_path
16976: _dbus_connection_acquire_io_path locking io_path_mutex
16976: _dbus_connection_acquire_io_path start connection->io_path_acquired
= 1 timeout = 0
16976: _dbus_connection_acquire_io_path waiting 0 for IO path to be
acquirable
16976: _dbus_connection_acquire_io_path end connection->io_path_acquired =
1 we_acquired = 0
16976: _dbus_connection_acquire_io_path unlocking io_path_mutex
16976:   LOCK: _dbus_connection_acquire_io_path
16976: _dbus_connection_do_iteration_unlocked end
3081477008: Dispatcher::on_wakeup_main
16976: _dbus_connection_send_preallocated_and_unlock middle
16976: dispatch status = complete is_connected = 1
16976:   UNLOCK: _dbus_connection_update_dispatch_status_and_unlock
3081477008: Object::handle_message: message was handled

>From what I've been able to figure out, whenever dbus_connection_send
can't write to the socket immediately, it's supposed to queue the message,
and if the queue went from 0->1 it's supposed to toggle the write watch to
tell the main thread to select()/poll() on the write fd.

As far as I can tell, that's not happening for some reason. I never see
on_watch_toggled_callback called, and the write watch is never enabled
(it's added to the watch list unenabled).

The next time that Connection::send is called, it had 2 pending to send,
then 3 pending to send, etc., so clearly the write watch isn't being
handled. At that point the programs which are connecting to it usually die
due to a timeout expiring, so it's not a temporary thing.

I worked around the problem by having Connection::send call
Connection::flush, but that's not exactly the ideal solution. Works for
now, though...

I'm attaching the full debug log - it's pretty verbose, and there's a lot
going on (there's output from the main program as well, and also a few
additional debugging outputs I added).

(Note that this is libdbus 1.2.16., and dbus-cxx 0.4.1)

Patrick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AmCtl.log.gz
Type: application/x-gzip
Size: 5342 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/dbus/attachments/20090901/4015439c/attachment.bin 


More information about the dbus mailing list