libdbus _dbus_connection_block_pending_call blocks until timeout, even after reply is received

Manish Narang Manish.Narang at kpit.com
Fri Dec 15 10:38:51 UTC 2017


Hi All,

We are facing a problem that we believe to be caused by libdbus in a multithreaded environment. It's reproducibility is extremely low, we've only seen it twice so far. Can you please advise if this has been already addressed, or in some way a flaw in our usage of libdbus.

Problem -

The watchdog implementation in our system detected a deadlock in an application thread and killed the process. Upon examining the backtrace it was found that the process was stuck on a dbus_connection_send_with_reply_and_block call.
What is perplexing is that the call is waiting for a reply that has already been processed by another thread in the process.

>From our logs -

00:10:02.759: 3|  404| 1315|-    |AbortDownload at kontactsynchronizer_p.h:366|Enter // The proxy API call
00:10:02.759: 3|  404| 1315|-    |addTimeout at kmainloop.cpp:234|CORE::Adding timer 361 with timeout of 23000 // libdbus asked us to add a timer
00:10:02.768: 3|  404|  438|-    |removeTimeout at kmainloop.cpp:302|CORE::Removing timer 361 // libdbus asked us to remove the added timer, indicating that the reply has been received in 9 ms.

Our theory for the rootcause of the defect

1. ThreadA calls dbus_connection_send_with_reply_and_block, which sends data out over the socket.
2. ThreadA checks if the reply message has been received in _dbus_connection_block_pending_call. This method peeks the incoming message queue to see if the reply has already arrived. Answer is No, and hence it is decided that poll is needed.
3. Context switch causes ThreadB to wake up, which happens to be the a threadpool thread monitoring the dbus socket for incoming data. This thread calls dbus_watch_handle and incoming message is moved to the pending queue.
4. ThreadA now resumes, and it starts polling the socket for the pending reply, as it is unaware of the reply message already handled by ThreadB.
5. Since the poll timeout is 25000 milliseconds, this call will now be blocked until some data arrives for this process. This means that in the case of complete radio silence, this call will take full 25 seconds to return even though the reply has already arrived in the process.

If the poll begins before the message is received, or if the poll is woken up by some other incoming message, then libdbus behaves correctly.

I see that something similar was being discussed here - https://sourceforge.net/p/windbus/mailman/windbus-svn/?viewmonth=200705. Apparently John Palmieri submitted a patch to address it, but the link is just a changelog and I cannot seem to track down the actual change.
Excerpt from John's update on the link -
"* dbus/dbus-connection.c:
(_dbus_connection_block_pending_call):
Check to see if our data has already been read off the connection
by another blocking pending call before we block in poll.
(check_for_reply_and_update_dispatch_unlocked):
Code taken from _dbus_connection_block_pending_call - checks for
an already read reply and updates the dispatch if there is one."

Backtrace -

#0  0xb6857490 in poll () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0xb65bf044 in _dbus_poll (fds=<optimized out>, n_fds=<optimized out>, timeout_milliseconds=<optimized out>)
    at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-sysdeps-unix.c:2761
No locals.
#2  0xb65b88a8 in socket_do_iteration (transport=0x2483ff0, flags=6, timeout_milliseconds=<optimized out>)
    at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-transport-socket.c:1167
        saved_errno = <optimized out>
        socket_transport = 0x2483ff0
        poll_fd = {fd = 5, events = 1, revents = 0}
        poll_res = <optimized out>
        poll_timeout = <optimized out>
#3  0xb65b77f0 in _dbus_transport_do_iteration (transport=0x2483ff0, flags=1, timeout_milliseconds=24993)
    at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-transport.c:1001
No locals.
#4  0xb659e990 in _dbus_connection_do_iteration_unlocked (connection=0x2484488, pending=0x0, flags=6, timeout_milliseconds=24993)
    at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-connection.c:1227
No locals.
#5  0xb659f4dc in _dbus_connection_block_pending_call (pending=0x26b5638) at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-connection.c:2527
        start_tv_sec = 602
        start_tv_usec = 761037
        tv_sec = 602
        tv_usec = 768470
        status = DBUS_DISPATCH_DATA_REMAINS
        connection = 0x2484488
        client_serial = 3059348700
        timeout = 0x26b3768
        timeout_milliseconds = 25000
        elapsed_milliseconds = 7
#6  0xb659fa00 in dbus_connection_send_with_reply_and_block (connection=<optimized out>, message=0xb41054f0, timeout_milliseconds=-1, error=0xa9e059e8)
    at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-connection.c:3575
        reply = <optimized out>
        pending = 0x26b5638
        __FUNCTION__ = "dbus_connection_send_with_reply_and_block"
#7  0xb6c56f78 in kivi::ipc::dbus::DBus::Send (this=0x2480e70, msg=..., result=...) at /usr/src/debug/kividbus/1.0-r0/git/dbusipc.cpp:446

Could you please advise how to address this in libdbus?


Thanks and Regards,

Manish Narang

Design Lead – Infotainment

Mobile : +918983509995 | manish.narang at kpit.com<mailto:manish.narang at kpit.com>

————————————————————————————————
KPIT|Web:  www.kpit.com<http://www.kpit.com>  | Social: Facebook<http://www.kpit.com/facebook> | Twitter<http://www.kpit.com/twitter> | Linkedin<http://www.kpit.com/linkedin> | Youtube<http://www.kpit.com/youtube>

This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dbus/attachments/20171215/d64421e6/attachment.html>


More information about the dbus mailing list