[systemd-devel] sometimes systemd takes a very long time to restart service

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Wed Jan 2 06:22:32 PST 2013


On Wed, Jan 02, 2013 at 02:28:19AM -0700, JB wrote:
> Andrey Borzenkov wrote:
> >В Tue, 01 Jan 2013 23:37:56 -0700
> >JB <general at itpsg.com> пишет:
> >>Andrey Borzenkov wrote:
> >>>В Tue, 01 Jan 2013 18:52:38 -0700
> >>>JB <general at itpsg.com> пишет:
> >>>>Thanks!  I'll try and it may work in my case.  What's
> >>>>interesting is that in your case it sounded like rsyslog was
> >>>>hanging around while it was having problems dealing with the
> >>>>condition of having the network unavailable.  In my case,
> >>>>webrickd actually stops and shuts down almost immediately
> >>>>but for some reason systemd doesn't or can't figure that
> >>>>out.
> >>>Showing "systemctl status webrickd.service" before restarting and
> >>>during restart may give some hints.
> >>Good idea.
> >>
> >>It goes from active/running to this immediately after restart...
> >>
> >>webrickd.service - Configuration ruby webrick daemon
> >>          Loaded: loaded (/etc/systemd/system/webrickd.service)
> >>          Active: deactivating (final-sigterm) since Tue, 01 Jan
> >>2013 22:44:39 -0700; 1min 14s ago
> >>         Process: 15771
> >>ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
> >>/home/rtuser/app/data/logs/webrickd.pid (code=exited,
> >>status=0/SUCCESS)
> >>
> >>Then this, anywhere from 1 to 3 minutes later...
> >>
> >>webrickd.service - Configuration ruby webrick daemon
> >>          Loaded: loaded (/etc/systemd/system/webrickd.service)
> >>          Active: active (running) since Tue, 01 Jan 2013
> >>22:47:40 -0700; 15s ago
> >>         Process: 15807
> >>ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
> >>/home/rtuser/app/data/logs/webrickd.pid (code=exited,
> >>status=0/SUCCESS)
> >
> >And initial state (i.e. after it is started)?
> >
> >>Any ideas why it sits in state "deactivating (final-sigterm)"
> >>for 1 to 3 minutes
> It gets better.  I took a very stripped simple daemon to try to
> understand the interaction between systemd and the process.  Here it
> is:
Hi,
your process is forking as it should (as least on my machine).
Putting systemd in debug mode 
    sudo kill -SIGRTMIN+22 1
and running
    sudo systemctl start webrickd.service
gives me:
Jan 02 08:55:27 spora sudo[20648]: zbyszek : TTY=pts/0 ; PWD=/home/zbyszek/src/systemd ; USER=root ; COMMAND=/bin/systemctl start webrickd.service
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Running GC...
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Trying to enqueue job webrickd.service/start/replace
Jan 02 08:55:27 spora systemd[1]: Installed new job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: Enqueued job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: About to execute: /usr/bin/ruby /home/zbyszek/src/systemd/test.rb
Jan 02 08:55:27 spora systemd[1]: Forked /usr/bin/ruby as 20651
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed failed -> start
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/webrickd_2eservice
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20651 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20651 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20651 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20651 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: control process exited, code=exited status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service got final SIGCHLD for state start
Jan 02 08:55:27 spora systemd[1]: Main PID guessed: 20653
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed start -> running
Jan 02 08:55:27 spora systemd[1]: Job webrickd.service/start finished, result=done
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20653 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20653 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20653 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20653 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: main process exited, code=exited, status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed running -> stop-sigterm
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20656 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20656 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20656 died (code=exited, status=1/FAILURE)
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
Jan 02 08:55:27 spora systemd[1]: webrickd.service: cgroup is empty
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed stop-sigterm -> dead
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 09:01:01 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
Jan 02 09:01:01 spora systemd[1]: Running GC...
Jan 02 09:01:01 spora systemd[1]: Collecting webrickd.service

With normal UNIX double forking, the process should fork twice, and
the first two should processes should exit. This is what happens,
but in the wrong order. The first process is only supposed to exit
after the main process is alive and ready to serve requests. As you
can see from the log above, the first process exits while the second
process is still running, and thus systemd thinks that this is the main
process. Subsequenty it exits, and the service is considered dead.

OTOH, your original problem (with restart taking >1 minute) is
something different. What systemd version exactly are you using? Can
you enable debugging and then post the output from the long 'systemctl
restart webrick.service'?

Zbyszek


More information about the systemd-devel mailing list