<div dir="ltr"> On May 11, 2016 12:07, "Lennart Poettering" <<a href="mailto:lennart@poettering.net" target="_blank">lennart@poettering.net</a>> wrote: > > On Wed, 11.05.16 11:27, Brian Kroth (<a href="mailto:bpkroth@gmail.com" target="_blank">bpkroth@gmail.com</a>) wrote: > > > Hi all, I'm in the midst of steeping myself in systemd docs as I prepare to > > face lift a slew of services for Debian Jessie updates. > > > > As I read through things I'm starting to think through a number of new ways > > I could potentially reorganize some of our services, which is cool. With my > > ideas though I think I'm finding a few gaps in either my understanding or > > systemd capabilities, so I wanted to send a few questions to the list. > > Hopefully this is the right place. > > > > The first should hopefully be a bit of a softball: > > > > With .service units one can specify OnFailure and other sorts of restart > > behaviors, including thresholds and backoffs for when to stop retrying and > > what to do then. Essentially a lightweight service problem escalation > > procedure. > > > > However, in reading systemd-system.conf, I don't see any way to specify > > something like DefaultOnFailure behavior for what to do on failure, perhaps > > after some simple restart attempts, for all services. Seems like it can > > only be done on a per unit basis, no? > > That is correct, yes. > > > Ideally, I'd like to be able to do something very simply like, declare > > if any service fails to restart itself or does so too often and enters a > > hard failure state, then systemd should (attempt to) fire off an > > escalation procedure unit like send a passive check status to Nagios or > > send an email, accepting that such procedures may depend upon network > > connectivity which may or may not be available (so maybe there's some > > circular dependency issues to work through in such a scenario, but I > > presume systemd already has facilities for handling that case, maybe via > > OnFailureJobMode= settings). > > > > Thoughts? > > That sounds like it goes towards service monitoring? > > I figure our theory there was that monitoring systems should probably > keep an eye on the journal stream generated, where there are events > generated about these issues. These log entries are recognizable by > their message ID and carry both human readable as well as structured > metadta that let you know what's going on. Our plan was originally to > then add a concept of "activation-by-log-event" to systemd, so that > you could activate some service each time a log event of a certain > kind happens. However, we never came around to actually hack that up, > it's still on the TODO list. > > I think OnFailure= and stuff are pretty useful for some things, but > for the monitoring case such a journal-based logic would be nicer, > because it can cover events triggered in a quick pace and during early > boot nicer, as they processing of this can happen serially and > asynchronously... Also, it would allow much nicer filtering for any > kind of event on the system, and we wouldn't happen to hook up every > kind of failure of each service with a OnFailure= like dependency. > > So yeah, I think we should have better support for what you are trying > to do, but I think we should best do that by delivering the > activate-by-log-message feature after all... > > Lennart Thanks, I'll look into that technique.Essentially in this case it'd be another .service script monitoring journal activity, perhaps with some filters, or else just a periodic cron job. Either way, I think you're right - that's probably the more generally applicable approach. I must admit I'd only done enough research/understanding of journald to get my syslog stuff working again. I hadn't really thought through what else it might offer/enable. Now that I have, I'm starting to see nice aspects to it. Too bad Debian Jessie is a little bit behind on a number of its (coredumpctl) and support cast (syslog-ng) features. Thanks, Brian </div>