[systemd-devel] Monitoring unit and overall state

Tomáš Pecka peckato1 at fit.cvut.cz
Fri Jun 5 12:42:46 UTC 2020


Hello,

we are currently working on a project where we'd like to let users know
the health status of the system's services. The health status is
supposed to be considered as "is every unit in systemd ok?".
For example, when a daemon crashes we want to switch the red light on
and keep it until the service is started again. After the service is
active again (and nothing else failed meanwhile), switch the green light on.

At first, we thought that it will be sufficient to monitor manager's
SystemState (or NFailedUnits) property on dbus. It sets up state to
'degraded' (and NFailedUnits=1) when the daemon crashes. However, the
unit can be set to auto-restart, and when it does, the SystemState
immediately restores back to 'running' (because the unit is in
'activating auto-restart' state and not failed anymore). So I don't
think I can use it here.

I can probably think of some ways how to achieve it. Perhaps monitoring
ActiveState/SubState properties of all units and look for the unit
states that we consider bad (probably 'failed' and 'activating
auto-restart'?) and keep track of units that are (not) OK. I am not sure
whether this is a good idea though. I think there might be better ways
which I do not currently see.

So -- is there a better way how get such information from systemd? We do
not care about the output when the manager is starting/stopping.

Thanks for any hints and/or pointers,
Tomas


More information about the systemd-devel mailing list