[systemd-devel] Restart and RestartSec in packaged .service files

Wed Sep 7 08:05:52 PDT 2011

Lennart Poettering <lennart <at> poettering.net> writes:
> > I was wondering if there is some kind of guideline about whether
> > packaged .service files in Fedora, etc. should specify Restart=,
> > RestartSec=, etc.
[...]
> There's currently no policy on this, but I generally do believe it would
> make a lot of sense to automatically respawn most services when they
> crash.

I admin a couple of (web) servers too, and I have to say I agree. I
think I represent the GNOME audience of sysadms - the damn thing
should just work out of the box and if it's broken, fix itself. Let me
just point out some things I've learned over the years with Monit.
Conceptually, it's less complicated than one thinks.

First, one has to accept that services sometimes die for no good
reason (cosmic ray or weird thread- related bug in some Apache module
that happens once a year). If you do not set up baby-sitting, you'll
be fine for half a year, and then you're toast. I think lazy sysadms
like me all learn it the hard way. I honestly believe it's a big bug
in a distros that you can install Apache and don't get baby-sitting
out of the box.

Now when a problem happens, either it is fixable by a restart
(perfect, almost no down time), disappears by itself (some
network-related problems), or human intervention is required (I or
some other developer made a mistake, or there's a hardware problem).
Trying out a restart is usually safe once the service has failed.

If, and only if, human intervention is required, I want a notification
(i.e. email). Ideally, only one. Logs are good for diagnostics, but I
don't want to check logs regularly, I want the system to tell me
what's wrong when it needs help.

Of course, if something fails all the time, it's probably going to
require human intervention even if a restart fixes it intermittently.
Say Apache died 10 times today - that's abnormal in my book, and I
would like to get a notification.

Of course, there are some extra opportunities for checking whether the
service is working, e.g. you can check a web app with a HTTP request
and see if you get a 200 back.

In my experience, these are nice as a kind of unit test so you don't
accidentally break something after an upgrade or a change, but they
find fewer errors than the simple "is the process running" check.

Anyway, I'm not saying systemd should do all this by itself; from a
web server admin perspective I just think it would be neat if we could
move closer to this experience with a default distro server install.
Of course, people will want to tweak things, and that's fine, but no
argument against a good default setup.

-- 
Ole Laursen
http://people.iola.dk/olau/