<div dir="ltr"><p dir="ltr"><br>
On May 11, 2016 12:07, "Lennart Poettering" <<a href="mailto:lennart@poettering.net" target="_blank">lennart@poettering.net</a>> wrote:<br>
><br>
> On Wed, 11.05.16 11:27, Brian Kroth (<a href="mailto:bpkroth@gmail.com" target="_blank">bpkroth@gmail.com</a>) wrote:<br>
><br>
> > Hi all, I'm in the midst of steeping myself in systemd docs as I prepare to<br>
> > face lift a slew of services for Debian Jessie updates.<br>
> ><br>
> > As I read through things I'm starting to think through a number of new ways<br>
> > I could potentially reorganize some of our services, which is cool. With my<br>
> > ideas though I think I'm finding a few gaps in either my understanding or<br>
> > systemd capabilities, so I wanted to send a few questions to the list.<br>
> > Hopefully this is the right place.<br>
> ><br>
> > The first should hopefully be a bit of a softball:<br>
> ><br>
> > With .service units one can specify OnFailure and other sorts of restart<br>
> > behaviors, including thresholds and backoffs for when to stop retrying and<br>
> > what to do then. Essentially a lightweight service problem escalation<br>
> > procedure.<br>
> ><br>
> > However, in reading systemd-system.conf, I don't see any way to specify<br>
> > something like DefaultOnFailure behavior for what to do on failure, perhaps<br>
> > after some simple restart attempts, for all services. Seems like it can<br>
> > only be done on a per unit basis, no?<br>
><br>
> That is correct, yes.<br>
><br>
> > Ideally, I'd like to be able to do something very simply like, declare<br>
> > if any service fails to restart itself or does so too often and enters a<br>
> > hard failure state, then systemd should (attempt to) fire off an<br>
> > escalation procedure unit like send a passive check status to Nagios or<br>
> > send an email, accepting that such procedures may depend upon network<br>
> > connectivity which may or may not be available (so maybe there's some<br>
> > circular dependency issues to work through in such a scenario, but I<br>
> > presume systemd already has facilities for handling that case, maybe via<br>
> > OnFailureJobMode= settings).<br>
> ><br>
> > Thoughts?<br>
><br>
> That sounds like it goes towards service monitoring?<br>
><br>
> I figure our theory there was that monitoring systems should probably<br>
> keep an eye on the journal stream generated, where there are events<br>
> generated about these issues. These log entries are recognizable by<br>
> their message ID and carry both human readable as well as structured<br>
> metadta that let you know what's going on. Our plan was originally to<br>
> then add a concept of "activation-by-log-event" to systemd, so that<br>
> you could activate some service each time a log event of a certain<br>
> kind happens. However, we never came around to actually hack that up,<br>
> it's still on the TODO list.<br>
><br>
> I think OnFailure= and stuff are pretty useful for some things, but<br>
> for the monitoring case such a journal-based logic would be nicer,<br>
> because it can cover events triggered in a quick pace and during early<br>
> boot nicer, as they processing of this can happen serially and<br>
> asynchronously... Also, it would allow much nicer filtering for any<br>
> kind of event on the system, and we wouldn't happen to hook up every<br>
> kind of failure of each service with a OnFailure= like dependency.<br>
><br>
> So yeah, I think we should have better support for what you are trying<br>
> to do, but I think we should best do that by delivering the<br>
> activate-by-log-message feature after all...<br>
><br>
> Lennart</p>
<p dir="ltr">Thanks, I'll look into that technique.</p><p dir="ltr">Essentially in this case it'd be another .service script monitoring journal activity, perhaps with some filters, or else just a periodic cron job. Either way, I think you're right - that's probably the more generally applicable approach.<br></p>
<p dir="ltr">I must admit I'd only done enough research/understanding of journald to get my syslog stuff working again. I hadn't really thought through what else it might offer/enable. Now that I have, I'm starting to see nice aspects to it. Too bad Debian Jessie is a little bit behind on a number of its (coredumpctl) and support cast (syslog-ng) features.</p>
<p dir="ltr">Thanks,<br>
Brian<br>
</p>
</div>