[systemd-devel] "Inactive/dead" services that are enabled are indistinguishable from unused or oneshot services

Thu Mar 17 14:48:36 PDT 2011

On Thu, 17.03.11 10:20, Mike Kazantsev (mk.fraggod at gmail.com) wrote:

> On Thu, 17 Mar 2011 01:39:19 +0100
> Lennart Poettering <lennart at poettering.net> wrote:
> 
> > On Thu, 24.02.11 13:55, Mike Kazantsev (mk.fraggod at gmail.com) wrote:
> > 
> > > Something like "systemctl --enabled" would certainly be much more
> > > useful for such cases than the current "systemctl --all", yet
> > > there will still be a lot of "oneshot" stuff, which are supposed to be
> > > dead, so a separate state for "!oneshot && enabled && exited" services
> > > like "stopped" (in place of "inactive") and maybe a view like "systemctl
> > > --stopped" would be of a great help from my sysadmin's perspective.
> > 
> > Hmm, thinking about this: wouldn't it be a lot more useful for your case
> > if we add an option which cuases services to enter fail if a service
> > exits cleanly, but does so for no reason, i.e. without being asked to do
> > that from systemd?
> > 
> > or maybe that should even be the default for most services? After all
> > only services which implement exit-on-idle would otherwise exit cleanly
> > just for fun without being asked for that...
> > 
> 
> I think it'd be an improvement, but that'd also give "failed" state a
> bit more ambiguity, although maybe it's not such a bad thing.
> 
> Experiencing several reboots on a machine with 50+ enabled daemons
> I've noticed that some of them (mostly the ones, started via some
> "laucher script" like apachectl, pg_ctl, ejabberdctl, etc) tend to
> "cleanly" fail randomly on start just because GuessMainPID= mechanism
> fails and systemd actually kills the service.

Hmm, GuessPID= fails? Do you know why exactly? Ideas for improvements?

The current logic is pretty simply: we look for all processes in the
service cgroup which have PPID == 1. If there is only one of these, we
assume it is the main process. In your case there hence must be more
than once where this condition applies? Any recommendation would else we
could check?

> I understand that there's a limited number of reasons for such "clean
> stop" (manual interaction, units like rsyslog.service, Conflicts=,
> isolate, etc), but still it's a wrong way to approach the particular
> problem.
> 
> I've solved the problem for myself by writing a simple dbus-python
> script (http://goo.gl/V6e7V). It shows exactly everything that's
> enabled and not active (with "oneshot" exception), not some random
> subset of this.

Hmm, jupp. I agree, this is very useful. I added this to the todo list now.

> Unfortunately, new rsyslog.service (and services using "systemctl stop"
> directly) can affect such display, which I think shows the flawed
> assumption that "enabled" in systemd means "should be active,
> period" (with the exception of "oneshot" units) on my part, and I don't
> know easy solution to this, short of adding another enabled-like state.

Hmm, yeah. This problem is hard. But I think simply showing "enabled but
not running" is already quite useful, even if a service on that list is
not necessarily buggy, but just not hooked in by anything.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.