[systemd-devel] Randomly on shutdown, stop timeout for user at .service (repeated report, different user)

Toms Seisums toms.seisums at gmail.com
Fri Oct 4 09:31:43 PDT 2013


@Andrey, the full log can be found here, the link was there before also:
http://pastebin.com/wbr04AQw (systemd-207)
It's quite silent about the cause for the timeout, though.

I've been debugging this for three days now, and I've crystallized a little
bit of information.

1. If I boot the system and stay on login screen, no matter when I hit
Ctrl+Alt+Del (ASAP or after an hour), the system will always reboot
normally.
2. If I login, and instantly type `reboot`, in about 75% the system will
reboot normally, the other 25% being slow, resulting in user at 0.servicetimeout.
2.1. If I login and wait for some time, the next reboot is going to be slow.
3. I'm using Arch, and they had a bug report already there -
https://bugs.archlinux.org/task/37007, it has come down to that systemd-208
fixes it, though, I've updated and the problem still remains.
4. I've attempted numerous daemon configurations, but the results are
inconsistent. Due to that Arch issue being closed and not reopened, I did
this, in order to find out whether running daemons aren't at fault.
4.1. I, for some reboot's disabled every daemon (not also running them
after the system booted up), and the reboots/shutdowns were still slow.
4.1.1. The Ctrl+Alt+Del before logging in, is still fast.
5. Downgrading to systemd-204 works flawlessly, also removing the issue
that's described in next point.
6. Whenever I boot with systemd-207 or systemd-208 and login, I am
presented with:

    Failed to open private bus connection: Failed to connect to socket
/run/user/0/dbus/user_bus_socket: No such file or directory

6.1. The issue appears to already be reported by @Andrey - here:
http://lists.freedesktop.org/archives/systemd-devel/2013-September/013393.html
7. Apparently, I specially set up a system running in VBox, also an Arch
Linux, that didn't have the problem neither running with systemd-207,
neither has it with systemd-208 (that actually led me to disable the
daemons).
7.1. I did encounter it once though, but since then, it hasn't happened.
8. If, before running shutdown, I manually: `systemctl stop user at 0.service`,
the system shuts down prefectly.

Here are some logs of shutdown with systemd-208:

 1. Quick, Ctrl+Alt+Del before logging in. (1: http://pastebin.com/5X0sUHCV)
 2. Slow, executed a little later after logging in. (2:
http://pastebin.com/tjnLNJKH)
 3. Slow, executed by Ctrl+Alt+Del after logging in, doing some tasks, then
`logout`. (3: http://pastebin.com/4hEmRYen)
 4. I wanted to catch a fast reboot with a full login and full daemons
running, but couldn't manage to - they all were slow.
 5. It also happens if I log in with a different user than root and push
Ctrl+Alt+Del. (4: http://pastebin.com/U36WqDzP)
 5.1. This is just a test for you @Lennart, just to see if a different user
changes anything.
 6. `systemctl stop user at 0.service` to `reboot`, that results in a quick
one. (5: http://pastebin.com/aJ0s32MR) (I, for a second thought that Apache
might be causing some issues, because it almost always gets killed, so, I
disabled it for some tests, this log has no Apache information - that's
just in case you're wondering why the logs don't match. More on Apache
adventures below...)
 6.1. I reran it after Apache tests to clarify, quick reboot the first time
(9: http://pastebin.com/HcUA72AE). The second time. (10:
http://pastebin.com/rf96QiGA) The third time. (11:
http://pastebin.com/bpFu1sQK) All in a row. I did 2 more to test for the
next point, all quick.
 6.2. This doesn't seem to change the SFTP behavior, though.

An extra piece of information I can provide, is that, this appears to
happen to break SFTP. If I boot, then connect from my other PC with SFTP
(as root), and initiate a reboot from tty0, my SFTP connection will not be
dropped. It will actually never even timeout. I could restart my Linux
machine for about 100times, and the SFTP window will not even blink,
thinking that it's connected. Only when I refresh the SFTP window, I get an
error message, asking me to reconnect. With systemd-204 there is no such an
issue - it disconnects immediately. Though, on my VBox Arch, there wasn't
such an issue with systemd-207, nor with systemd-208.


And @Lennart, you asked, I delivered, here is a log with
user at 0.servicestart/stop with 0.5second intervals between start/stop
calls. I stripped
the booting sequence, starting the log off at about where the system ended
it's boot. (12: http://sharetext.org/zJc, the link is different because I
exceeded pastebins free limit, doh).
Nothing special there, except that dbus thingy.

---

Apache:

So, noticing that Apache gets killed most of time, and systemd-204 also had
some delays where it waited for Apache to exit, I thought that the problem
is with Apache.

I started this debugging scenario by simply doing a "systemctl stop httpd"
into "reboot", that made my system to reboot quickly. (6:
http://pastebin.com/4NgxVAeP)

Though, one `httpd` instance still remains (occasionally, when I "systemctl
stop httpd", "ps aux" immediately after - will still display one httpd
process), that appears to actually be a bug in Apache's (I guess), that
appears to be the one that has to be explicitly killed at the end of reboot
process.

By partially confirming, that this is because of Apache, I stopped it upon
next boot, disabled it, and rebooted. Starting freshly with "ps aux"
reporting no signs of httpd, I attempted to "reboot" - guess what? That's a
long one. Three times in a row with no httpd upon boot - all reboots were
long, I guess that's not Apache then.

While writing the above and debugging, after having enabled Apache back and
stopping it upon boot, even though previous reboots executed immediately,
one didn't. (7: http://pastebin.com/3FnikCX9) Apparently the next one after
it also didn't. (8: http://pastebin.com/pnWhuQWC)

---

P.S. While debugging this, and trying to link some of the problems to some
of the daemons, I'm starting to feel as if I'd hit a delusion spree. While
occasionally it reboots fast, it might as well be just the random
quick-reboot I've been hunting before. Just that stopping those daemons
makes me think the problem is tied to them.


---

Loads of information, some of it are probably total duplicates, but I hope
I've given enough and that you'll be able to understand what I've provided.
More importantly, I hope that this will help to resolve these issues.


On 1 October 2013 03:43, Lennart Poettering <lennart at poettering.net> wrote:

> On Mon, 30.09.13 18:55, Toms Seisums (toms.seisums at gmail.com) wrote:
>
> > It appears that history repeats itself.
> >
> > Once (July 2013) a similar issue was reported here, by different user:
> >
> http://lists.freedesktop.org/archives/systemd-devel/2013-July/012283.html
> >
> > I'm also using Arch Linux, systemd-207, linux-3.11.2.
> >
> > On shutdown and reboot, user at 0.service occasionally freezes my system
> until
> > it times out and gets killed.
> >
> > Steps to reproduce:
> > 1) Boot
> > 2) Login
> > 3) Execute either systemctl reboot or reboot or shutdown -h now or
> whatever.
> >
> > I've went through the steps here, to get a proper shutdown log:
> >
> http://freedesktop.org/wiki/Software/systemd/Debugging/#shutdowncompleteseventually
> >
> > And, the shutdown log can be found here: http://pastebin.com/wbr04AQw
> >
> > The actual lines related to issue:
> >
> > #2223 [   74.387130] systemd[1]: Got D-Bus request:
> > org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
> > #2224 [  163.814925] systemd[1]: user at 0.service stopping timed out.
> Killing.
> > #2225 [  163.815447] systemd[1]: user at 0.service changed stop-sigterm ->
> > stop-sigkill
> > #2226 [  163.816063] systemd[1]: Received SIGCHLD from PID 326 (systemd).
> > #2227 [  163.816093] systemd[1]: Got SIGCHLD for process 326 (systemd)
> > #2228 [  163.816153] systemd[1]: Child 326 died (code=killed,
> status=9/KILL)
> > #2229 [  163.816158] systemd[1]: Child 326 belongs to user at 0.service
> > #2230 [  163.816171] systemd[1]: user at 0.service: main process exited,
> > code=killed, status=9/KILL
> > #2231 [  163.816183] systemd[1]: user at 0.service changed stop-sigkill ->
> > failed
> >
> > I currently haven't found out a pattern for this. I have attempted to
> quit
> > some processes before initiating the shutdown/reboot, but the result is
> > random.
> >
> > About 95% of shutdowns/reboots it will be a long one.
> > 3% it will be noticeably longer than the quickest one, but shorter than
> the
> > above one.
> > Remaining 2%, the shutdown will be very quick.
>
> Hmm, this is the systemd user instance for root which fails to shut
> down. Can you reproduce this if you start/stop "user at 0.service" quickly
> in a loop?
>
> Lennart
>
> --
> Lennart Poettering - Red Hat, Inc.
>



-- 
Toms Seisums

- (+371) 26431391

May the force be with you, always!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20131004/840ecc74/attachment.html>


More information about the systemd-devel mailing list