send_with_reply_and_block returning only after pending call times out

Sriram Gopalan mgsram at gmail.com
Tue Jul 31 19:12:18 UTC 2018


Note: We are using a patched version of D-bus (on top of 1.10.24). Also we
are using the reference implementation directly. Unfortunately it is not
possible for us to upgrade the version in this revision of the product.

This is a wierd issue that is seen very rarely. We have infotainment test
benches for our system that runs 24/7 and once in a while, we have come
across as issue where in there is no response to a blocked D-Bus send call;
as if the message is lost. Unfortunately, there is no specific use case or
process.

The issue is not seen at all if verbose logging is enabled in D-Bus or for
that matter even in our code. By means of careful conditional logging over
several days, we have been able to isolate it to the client process and the
rough sequence of events is as follows:
1) Client process C calls send_with_reply_and_block and is waiting for reply
2) Server process S acts on the call on sends a reply out
3) C never sees the response and eventually the pending call times out.
4) Things work normally from here on. I.e. once in a while, there is
glitch. That's all.

There is nothing reported in the journal logs by the daemon and I cannot
suspect that the message is lost in transaction.

Through our logging exercise, we could also narrow it down to being stuck
at line 2530 of dbus-connection.c[1] excerpt of which follows.

<code>
/* block again, we don't have the reply buffered yet. */
_dbus_connection_do_iteration_unlocked (connection,
pending,
DBUS_ITERATION_DO_READING |
DBUS_ITERATION_BLOCK,
timeout_milliseconds - elapsed_milliseconds);

</code>


By going through the code, trying to follow the flow, I could not
understand as to why it was not blocked by the previous call to
_dbus_connection_do_iteration_unlocked.

I have the following questions:

1) Could this possibly be caused by back to back calls to
dbus_watch_handle() on the same watch object after event loop has detected
I/O? There is a very remote possibility of this happening in the event loop
logic, but is not completely ruled out. (I will fix this and post again
with the results)

2) Under what circumstances we will hit the above code block or rather
escape the previous call to _dbus_connection_do_iteration_unlocked

3) Do you have any additional pointers in terms of where to look / what to
log etc. to be able to nail this down better.


Thanking in anticipation,

Sriram


1. https://dbus.freedesktop.org/doc/api/html/dbus-connection_8c_source.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dbus/attachments/20180801/26ab4204/attachment.html>


More information about the dbus mailing list