libdbus _dbus_connection_block_pending_call blocks until timeout, even after reply is received

Manish Narang Manish.Narang at kpit.com
Thu Jan 25 11:39:44 UTC 2018



On Friday 15 December 2017 11:46 PM, Simon McVittie wrote:
> For best results please talk to the mailing list or the bug tracker,
> not to me personally - I'm not always going to be around, but if your
> messages are on the mailing list or bug tracker, others can see them too.
>

Sorry, I just did a reply to the mail, and my mail client chose to reply to Simon directly.

> On Fri, 15 Dec 2017 at 15:38:47 +0000, Manish Narang wrote:
>>> This might be similar to, or the same root cause as,
>>> <https://bugs.freedesktop.org/show_bug.cgi?id=102839>
>>> which also has more general information on libdbus and threading.
>> The ticket 102839 looks like an exact match.
>> Your comment https://bugs.freedesktop.org/show_bug.cgi?id=102839#c12
>> shows you have roughly figured out a potential fix.
>> However, it seems that your discussions with Michael on the ticket are
>> yet to conclude.
>> Do you think investigating this proposed fix further is worthwhile ?
>> Have you tried this or is this a theory thus far?
> I have no way to reproduce this bug, so my side of #102839 is all based on
> theory and reading the source code. If you can provide a minimal test-case
> (preferably pure libdbus) that would be very helpful - especially if
> you're willing to put it under a license that we can use in the test suite
> to try to stop this coming back.

I've written a small test client and server to help reproduce this
problem. It's under AFL2.1 license, same as libdbus.
The client just bombards the server with method calls from multiple
threads. In a matter of a few hundred calls the client gets stuck.
Upon inspecting the execution under gdb, following is the back-trace for
the stuck thread. It shows that the dbus_pending_call_block blocks on
I/O, even though the reply has been received for the message.

(gdb) t 15
[Switching to thread 15 (Thread 0x7ffff0d4f700 (LWP 1620))]
#0  0x00007ffff765870d in poll () at ../sysdeps/unix/syscall-template.S:84
84    ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff765870d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff7ba3832 in poll (__timeout=<optimized out>,
__nfds=<optimized out>, __fds=<optimized out>) at
/usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  _dbus_poll (fds=<optimized out>, n_fds=<optimized out>,
timeout_milliseconds=<optimized out>) at dbus-sysdeps-unix.c:2761
#3  0x00007ffff7b8d5bf in socket_do_iteration (transport=0x603740,
flags=6, timeout_milliseconds=<optimized out>) at
dbus-transport-socket.c:1167
#4  0x00007ffff7b8b717 in _dbus_transport_do_iteration
(transport=0x603740, flags=flags at entry=6,
timeout_milliseconds=timeout_milliseconds at entry=25000) at
dbus-transport.c:1001
#5  0x00007ffff7b5e12c in _dbus_connection_do_iteration_unlocked
(connection=connection at entry=0x603ea0, pending=pending at entry=0x0,
flags=flags at entry=6,
timeout_milliseconds=timeout_milliseconds at entry=25000) at
dbus-connection.c:1227
#6  0x00007ffff7b5f8e3 in _dbus_connection_block_pending_call
(pending=0x7fffb0000cf0) at dbus-connection.c:2527
#7  0x00007ffff7b80cd6 in dbus_pending_call_block (pending=<optimized
out>) at dbus-pending-call.c:741
#8  0x00007ffff7b60230 in dbus_connection_send_with_reply_and_block
(connection=0x603ea0, message=0x7fffb0001180,
timeout_milliseconds=25000, error=0x7ffff0d4eee0) at dbus-connection.c:3575
#9  0x00000000004011ff in thread_func ()
#10 0x00007ffff792e6ba in start_thread (arg=0x7ffff0d4f700) at
pthread_create.c:333
#11 0x00007ffff76643dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) f 6
#6  0x00007ffff7b5f8e3 in _dbus_connection_block_pending_call
(pending=0x7fffb0000cf0) at dbus-connection.c:2527
2527              _dbus_connection_do_iteration_unlocked (connection,
(gdb) set print pretty
(gdb) p *pending
$1 = {
   refcount = {
     value = 2
   },
   slot_list = {
     slots = 0x0,
     n_slots = 0
   },
   function = 0x0,
   connection = 0x603ea0,
   reply = 0x7fffa8001c00,
   timeout = 0x7fffb00010d0,
   timeout_link = 0x7fffa8000f50,
   reply_serial = 79,
   completed = 1, // Pending call completed is already set to TRUE
   timeout_added = 0
}
> https://bugs.freedesktop.org/show_bug.cgi?id=102839#c12 outlines what I
> think a solution should probably look like, but I can't test it and I
> only have a limited amount of time to work on dbus, so I haven't written
> a specific patch along those lines. It isn't clear to me exactly what
> Michael has tried: https://bugs.freedesktop.org/show_bug.cgi?id=102839#c16
> suggests that he may have tried something similar to what I suggested,
> but doesn't say exactly what, so I can't draw any conclusions about why
> whatever he tried didn't work.

I created and tested a patch (also attached herewith) based on your
suggestions in https://bugs.freedesktop.org/show_bug.cgi?id=102839#c12
The patch seems to fix the problem, and the test client successfully
finishes all the calls without blocking.

> It's also possible that something in my analysis was wrong - if that's
> the case, I'm happy to be corrected. Please double-check my reasoning?

As already mentioned, the patch addresses the issue and the root-cause
seems valid. Protecting pending->completed with the connection lock
ensures that it is not updated by some other thread, while the blocked
thread drops connection lock temporarily to acquire the I/O path. We now
always hit the "pending call completed while acquiring I/O path" branch
in such a scenario now.

> Sorry, I'm not going to merge Michael's patch without knowing that it's
> correct, and I can't do that without knowing exactly what it fixes -
> future maintainers need this information to avoid making this stuff
> regress.

I understand. Hopefully you have enough information to integrate the
attached patch now. Please provide a confirmation If this patch is Ok. I
will maintain this patch locally after you confirm, since I cannot
migrate to the latest release in my production system just for this fix.
If there is a formal channel to submit a patch for upstream acceptance,
please let me know and I'll route the submission through there. I am not
well versed with this process, I'm sorry.

PS - libdbus version used for this exercise - 1.10.8
Also PS - I see intermittent failures saying -
"dbus_connection_send_with_reply Method "add_numbers" with signature "s"
on interface "com.test.dbus_example" doesn't exist". I haven't been able
to debug this, but since it doesn't interfere with the defect we're
trying to fix and I don't get such errors in my production code, I've
attributed these errors to a haphazard test implementation and ignored
them.

Thanks,
Manish
This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libdbus_pending_call_stuck_test.tar
Type: application/x-tar
Size: 20480 bytes
Desc: libdbus_pending_call_stuck_test.tar
URL: <https://lists.freedesktop.org/archives/dbus/attachments/20180125/46de877f/attachment-0001.tar>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libdbus_102839.patch
Type: text/x-patch
Size: 3984 bytes
Desc: libdbus_102839.patch
URL: <https://lists.freedesktop.org/archives/dbus/attachments/20180125/46de877f/attachment-0001.bin>


More information about the dbus mailing list