libdbus _dbus_connection_block_pending_call blocks until timeout, even after reply is received
Manish Narang
Manish.Narang at kpit.com
Fri Dec 15 10:38:51 UTC 2017
Hi All,
We are facing a problem that we believe to be caused by libdbus in a multithreaded environment. It's reproducibility is extremely low, we've only seen it twice so far. Can you please advise if this has been already addressed, or in some way a flaw in our usage of libdbus.
Problem -
The watchdog implementation in our system detected a deadlock in an application thread and killed the process. Upon examining the backtrace it was found that the process was stuck on a dbus_connection_send_with_reply_and_block call.
What is perplexing is that the call is waiting for a reply that has already been processed by another thread in the process.
>From our logs -
00:10:02.759: 3| 404| 1315|- |AbortDownload at kontactsynchronizer_p.h:366|Enter // The proxy API call
00:10:02.759: 3| 404| 1315|- |addTimeout at kmainloop.cpp:234|CORE::Adding timer 361 with timeout of 23000 // libdbus asked us to add a timer
00:10:02.768: 3| 404| 438|- |removeTimeout at kmainloop.cpp:302|CORE::Removing timer 361 // libdbus asked us to remove the added timer, indicating that the reply has been received in 9 ms.
Our theory for the rootcause of the defect
1. ThreadA calls dbus_connection_send_with_reply_and_block, which sends data out over the socket.
2. ThreadA checks if the reply message has been received in _dbus_connection_block_pending_call. This method peeks the incoming message queue to see if the reply has already arrived. Answer is No, and hence it is decided that poll is needed.
3. Context switch causes ThreadB to wake up, which happens to be the a threadpool thread monitoring the dbus socket for incoming data. This thread calls dbus_watch_handle and incoming message is moved to the pending queue.
4. ThreadA now resumes, and it starts polling the socket for the pending reply, as it is unaware of the reply message already handled by ThreadB.
5. Since the poll timeout is 25000 milliseconds, this call will now be blocked until some data arrives for this process. This means that in the case of complete radio silence, this call will take full 25 seconds to return even though the reply has already arrived in the process.
If the poll begins before the message is received, or if the poll is woken up by some other incoming message, then libdbus behaves correctly.
I see that something similar was being discussed here - https://sourceforge.net/p/windbus/mailman/windbus-svn/?viewmonth=200705. Apparently John Palmieri submitted a patch to address it, but the link is just a changelog and I cannot seem to track down the actual change.
Excerpt from John's update on the link -
"* dbus/dbus-connection.c:
(_dbus_connection_block_pending_call):
Check to see if our data has already been read off the connection
by another blocking pending call before we block in poll.
(check_for_reply_and_update_dispatch_unlocked):
Code taken from _dbus_connection_block_pending_call - checks for
an already read reply and updates the dispatch if there is one."
Backtrace -
#0 0xb6857490 in poll () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1 0xb65bf044 in _dbus_poll (fds=<optimized out>, n_fds=<optimized out>, timeout_milliseconds=<optimized out>)
at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-sysdeps-unix.c:2761
No locals.
#2 0xb65b88a8 in socket_do_iteration (transport=0x2483ff0, flags=6, timeout_milliseconds=<optimized out>)
at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-transport-socket.c:1167
saved_errno = <optimized out>
socket_transport = 0x2483ff0
poll_fd = {fd = 5, events = 1, revents = 0}
poll_res = <optimized out>
poll_timeout = <optimized out>
#3 0xb65b77f0 in _dbus_transport_do_iteration (transport=0x2483ff0, flags=1, timeout_milliseconds=24993)
at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-transport.c:1001
No locals.
#4 0xb659e990 in _dbus_connection_do_iteration_unlocked (connection=0x2484488, pending=0x0, flags=6, timeout_milliseconds=24993)
at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-connection.c:1227
No locals.
#5 0xb659f4dc in _dbus_connection_block_pending_call (pending=0x26b5638) at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-connection.c:2527
start_tv_sec = 602
start_tv_usec = 761037
tv_sec = 602
tv_usec = 768470
status = DBUS_DISPATCH_DATA_REMAINS
connection = 0x2484488
client_serial = 3059348700
timeout = 0x26b3768
timeout_milliseconds = 25000
elapsed_milliseconds = 7
#6 0xb659fa00 in dbus_connection_send_with_reply_and_block (connection=<optimized out>, message=0xb41054f0, timeout_milliseconds=-1, error=0xa9e059e8)
at /usr/src/debug/dbus/1.10.8-r0/git/dbus/dbus-connection.c:3575
reply = <optimized out>
pending = 0x26b5638
__FUNCTION__ = "dbus_connection_send_with_reply_and_block"
#7 0xb6c56f78 in kivi::ipc::dbus::DBus::Send (this=0x2480e70, msg=..., result=...) at /usr/src/debug/kividbus/1.0-r0/git/dbusipc.cpp:446
Could you please advise how to address this in libdbus?
Thanks and Regards,
Manish Narang
Design Lead – Infotainment
Mobile : +918983509995 | manish.narang at kpit.com<mailto:manish.narang at kpit.com>
————————————————————————————————
KPIT|Web: www.kpit.com<http://www.kpit.com> | Social: Facebook<http://www.kpit.com/facebook> | Twitter<http://www.kpit.com/twitter> | Linkedin<http://www.kpit.com/linkedin> | Youtube<http://www.kpit.com/youtube>
This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dbus/attachments/20171215/d64421e6/attachment.html>
More information about the dbus
mailing list