method timeouts vs systemd activation and slow systems

Thu Mar 2 15:15:23 UTC 2023

A while ago now my role switched from working on the Linux desktop to working on Linux servers, and since then I've discovered that servers are a bit more of a wild world.  

Long ago I think Havoc chose to default to 25 second timeouts for DBus method calls.  This is generally OK on mostly idle single-user desktop systems.  (Particularly modern laptops/desktops with NVMe drives etc.)

But servers...well, it turns out that some people want to e.g. use under-resourced (or rather over-provisioned) private OpenStack instances for example - and there, I/O can be *really* slow.  And basically we start seeing the 25 second DBus method call show up here as system failures.

Now later, systemd came along and started defaulting to 90 seconds for service activation.  This is IMO a bit more reasonable of a timeout.

But crucially - the systemd activation timeout can be conveniently configured *system wide*.  And basically that's what I want here - the ability to change the OS to detect this situation and globally affect dbus method call timeouts.

Now unfortunately today, these timeouts are hardcoded in client code.  But...here's my strawman:

- Change dbus-{daemon,broker} to have a config for this
- Default that config to the systemd service start timeout
- Fix client libraries to query the daemon for the timeout (or parse the configs on their own? not sure)

Also xref the related https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer and my comment https://github.com/coreos/fedora-coreos-tracker/issues/1404#issuecomment-1412827914