<p dir="ltr">Hi all, I'm in the midst of steeping myself in systemd docs as I prepare to face lift a slew of services for Debian Jessie updates.</p>
<p dir="ltr">As I read through things I'm starting to think through a number of new ways I could potentially reorganize some of our services, which is cool. With my ideas though I think I'm finding a few gaps in either my understanding or systemd capabilities, so I wanted to send a few questions to the list. Hopefully this is the right place.</p>
<p dir="ltr">The first should hopefully be a bit of a softball:</p>
<p dir="ltr">With .service units one can specify OnFailure and other sorts of restart behaviors, including thresholds and backoffs for when to stop retrying and what to do then. Essentially a lightweight service problem escalation procedure.</p>
<p dir="ltr">However, in reading systemd-system.conf, I don't see any way to specify something like DefaultOnFailure behavior for what to do on failure, perhaps after some simple restart attempts, for all services. Seems like it can only be done on a per unit basis, no?</p>
<p dir="ltr">Ideally, I'd like to be able to do something very simply like, declare <br>
if any service fails to restart itself or does so too often and enters a hard failure state, then systemd should (attempt to) fire off an escalation procedure unit like send a passive check status to Nagios or send an email, accepting that such procedures may depend upon network connectivity which may or may not be available (so maybe there's some circular dependency issues to work through in such a scenario, but I presume systemd already has facilities for handling that case, maybe via OnFailureJobMode= settings).</p>
<p dir="ltr">Thoughts?</p>
<p dir="ltr">Thanks,<br>
Brian</p>