method timeouts vs systemd activation and slow systems

David Rheinsberg david at readahead.eu
Wed Mar 8 08:04:38 UTC 2023


Hi

On Mon, Mar 6, 2023, at 1:39 PM, Simon McVittie wrote:
> There are some timeouts in the "limits" data structure of the message
> bus implementation (dbus-daemon or dbus-broker) which are somewhat
> orthogonal to the client-side method call timeout. In dbus-daemon
> configuration language:
>
> * service_start_timeout (default 25s)
> * auth_timeout (default 5s)
> * pending_fd_timeout (default 150s)

For the record, we do not use those timeouts in dbus-broker. "Service-start" is under control of systemd, and for the other two we do not implement the timeouts since we have resource accounting in both situations (which requires rather recent linux APIs and does not necessarily have an equivalent on other platforms).


Regarding the proposal: I can only recommend dropping timeouts in dbus transactions. We have had excellent experience with this approach. Our reasoning is quite simple: Any high-level application will have some watchdog infrastructure that already ensures it is restarted if it is stuck. By adding timeouts to low-level operations you do not improve the resiliency. The only advantage I see is that you *might* notice a stuck-operation earlier, but really only if your low-level timeouts are lower than your high-level watchdogs. But the cost is high: you make the entire system more complex, you suddenly have to deal with conflicting timeouts in different communication APIs, you suddenly run into false-positives where your timeouts was simply too small and thus unsuitable for the target platform.

I do not believe transaction timeouts for reliable channels like dbus contribute to the system resiliency of recoverability. I also believe those timeouts do not make sense for low-level APIs. If you ensure your operations can be cancelled, I believe you should let the upper levels control the timeouts of the entire action rather than each individual step.


Lastly, note that whenever you "timeout" a method-transaction in a dbus-client, you leak the reply-window in the server. Unless the other side eventually replies or disconnects, you will accumulate those reply-windows and eventually reach your resource limit.

Thanks
David


More information about the dbus mailing list