[systemd-devel] sometimes systemd takes a very long time to restart service

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Fri Jan 4 10:51:35 PST 2013


On Thu, Jan 03, 2013 at 01:41:57PM -0700, JB wrote:
> Zbigniew Jędrzejewski-Szmek wrote:
> >On Wed, Jan 02, 2013 at 02:28:19AM -0700, JB wrote:
> >>Andrey Borzenkov wrote:
> >>>В Tue, 01 Jan 2013 23:37:56 -0700
> >>>JB <general at itpsg.com> пишет:
> >>>>Andrey Borzenkov wrote:
> >>>>>В Tue, 01 Jan 2013 18:52:38 -0700
> >>>>>JB <general at itpsg.com> пишет:
> >>>>>>Thanks!  I'll try and it may work in my case.  What's
> >>>>>>interesting is that in your case it sounded like rsyslog was
> >>>>>>hanging around while it was having problems dealing with the
> >>>>>>condition of having the network unavailable.  In my case,
> >>>>>>webrickd actually stops and shuts down almost immediately
> >>>>>>but for some reason systemd doesn't or can't figure that
> >>>>>>out.
> >>>>>Showing "systemctl status webrickd.service" before restarting and
> >>>>>during restart may give some hints.
> >>>>Good idea.
> >>>>
> >>>>It goes from active/running to this immediately after restart...
> >>>>
> >>>>webrickd.service - Configuration ruby webrick daemon
> >>>>         Loaded: loaded (/etc/systemd/system/webrickd.service)
> >>>>         Active: deactivating (final-sigterm) since Tue, 01 Jan
> >>>>2013 22:44:39 -0700; 1min 14s ago
> >>>>        Process: 15771
> >>>>ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
> >>>>/home/rtuser/app/data/logs/webrickd.pid (code=exited,
> >>>>status=0/SUCCESS)
> >>>>
> >>>>Then this, anywhere from 1 to 3 minutes later...
> >>>>
> >>>>webrickd.service - Configuration ruby webrick daemon
> >>>>         Loaded: loaded (/etc/systemd/system/webrickd.service)
> >>>>         Active: active (running) since Tue, 01 Jan 2013
> >>>>22:47:40 -0700; 15s ago
> >>>>        Process: 15807
> >>>>ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
> >>>>/home/rtuser/app/data/logs/webrickd.pid (code=exited,
> >>>>status=0/SUCCESS)
> >>>And initial state (i.e. after it is started)?
> >>>
> >>>>Any ideas why it sits in state "deactivating (final-sigterm)"
> >>>>for 1 to 3 minutes
> >>It gets better.  I took a very stripped simple daemon to try to
> >>understand the interaction between systemd and the process.  Here it
> >>is:
> >Hi,
> >your process is forking as it should (as least on my machine).
                   ^not (arghhh!)

> >Putting systemd in debug mode     sudo kill -SIGRTMIN+22 1
> >and running
> >    sudo systemctl start webrickd.service
> >gives me:
> >Jan 02 08:55:27 spora sudo[20648]: zbyszek : TTY=pts/0 ; PWD=/home/zbyszek/src/systemd ; USER=root ; COMMAND=/bin/systemctl start webrickd.service
> >Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
> >Jan 02 08:55:27 spora systemd[1]: Running GC...
> >Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
> >Jan 02 08:55:27 spora systemd[1]: Trying to enqueue job webrickd.service/start/replace
> >Jan 02 08:55:27 spora systemd[1]: Installed new job webrickd.service/start as 2851
> >Jan 02 08:55:27 spora systemd[1]: Enqueued job webrickd.service/start as 2851
> >Jan 02 08:55:27 spora systemd[1]: About to execute: /usr/bin/ruby /home/zbyszek/src/systemd/test.rb
> >Jan 02 08:55:27 spora systemd[1]: Forked /usr/bin/ruby as 20651
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service changed failed -> start
> >Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
> >Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/webrickd_2eservice
> >Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20651 (ruby).
> >Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20651 (ruby)
> >Jan 02 08:55:27 spora systemd[1]: Child 20651 died (code=exited, status=0/SUCCESS)
> >Jan 02 08:55:27 spora systemd[1]: Child 20651 belongs to webrickd.service
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service: control process exited, code=exited status=0
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service got final SIGCHLD for state start
> >Jan 02 08:55:27 spora systemd[1]: Main PID guessed: 20653
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service changed start -> running
> >Jan 02 08:55:27 spora systemd[1]: Job webrickd.service/start finished, result=done
> >Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
> >Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20653 (ruby).
> >Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20653 (ruby)
> >Jan 02 08:55:27 spora systemd[1]: Child 20653 died (code=exited, status=0/SUCCESS)
> >Jan 02 08:55:27 spora systemd[1]: Child 20653 belongs to webrickd.service
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service: main process exited, code=exited, status=0
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service changed running -> stop-sigterm
> >Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20656 (ruby).
> >Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20656 (ruby)
> >Jan 02 08:55:27 spora systemd[1]: Child 20656 died (code=exited, status=1/FAILURE)
> >Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
> >Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service: cgroup is empty
> >Jan 02 08:55:27 spora systemd[1]: webrickd.service changed stop-sigterm -> dead
> >Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
> >Jan 02 09:01:01 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
> >Jan 02 09:01:01 spora systemd[1]: Running GC...
> >Jan 02 09:01:01 spora systemd[1]: Collecting webrickd.service
> >
> >With normal UNIX double forking, the process should fork twice, and
> >the first two should processes should exit. This is what happens,
> >but in the wrong order. The first process is only supposed to exit
> >after the main process is alive and ready to serve requests. As you
> >can see from the log above, the first process exits while the second
> >process is still running, and thus systemd thinks that this is the main
> >process. Subsequenty it exits, and the service is considered dead.
> Ahh, that makes tremendous sense.  Thank you so much for taking a
> look at that.  I'm going to have to read up on how to put systemd in
> debug mode.
Everything should be described in systemd(1). Cf. sections SIGNALS and KERNEL
COMMAND LINE.

> I've written daemon's before in C where I have complete
> control over this.  I wonder if the forking implementation Ruby's
> daemonize has this backwards.  The more I learn about Ruby the more
> frustrated I become with it.  I've worked around this problem by
> using the "Simple" type and not daemonizing the process.
> >OTOH, your original problem (with restart taking >1 minute) is
> >something different. What systemd version exactly are you using? Can
> >you enable debugging and then post the output from the long 'systemctl
> >restart webrick.service'?
> I'm using whatever comes with FC15.  According to the log it's
> version 26 (RPM is 26-18)
Oh, this is really really old. Please retest with latest FC.

> I'm currently in crisis with this app and have already stripped the
> daemon code out and rewritten accordingly.  When I have this
> released, I'll go ahead and run the test you suggested but I think
> they might be related.

Zbyszek


More information about the systemd-devel mailing list