[systemd-devel] A server says: "System is going down." But never does.

Tue Nov 28 15:02:22 UTC 2017

Hi,

This mailing list is the only place where I expect to have some
helpful feedback. But feel free to suggest other places. I'd like to
investigate situation I have now, find out what went wrong and prevent
it from happening again if possible. Your help is appreciated.

Like I said, a server reports that it's going down, when I ssh to it
as root. As a non-root user, it says that and closes the connection.

In the journal I see a lot of this:

Nov 28 16:22:01 st2 systemd-journal[353]: Journal stopped
Nov 28 16:22:01 st2 systemd-journal[494]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:22:01 st2 systemd-journal[494]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:22:01 st2 systemd-journal[494]: Journal started
Nov 28 16:22:01 st2 systemd[1]: systemd-journald.service watchdog
timeout (limit 1min)!
Nov 28 16:22:01 st2 systemd-journald[353]: Received SIGTERM from PID 1
(systemd).
Nov 28 16:22:01 st2 systemd[1]: Unit systemd-journald.service entered
failed state.
Nov 28 16:22:01 st2 systemd[1]: systemd-journald.service has no
holdoff time, scheduling restart.
Nov 28 16:22:01 st2 systemd[1]: Stopping Journal Service...
Nov 28 16:22:01 st2 systemd[1]: Starting Journal Service...
Nov 28 16:22:01 st2 systemd[1]: Started Journal Service.
Nov 28 16:22:01 st2 systemd[1]: Starting Trigger Flushing of Journal
to Persistent Storage...
Nov 28 16:22:01 st2 systemd[1]: systemd-journal-flush.service: main
process exited, code=exited, status=1/FAILURE
Nov 28 16:22:01 st2 systemd[1]: Failed to start Trigger Flushing of
Journal to Persistent Storage.
Nov 28 16:22:01 st2 systemd[1]: Unit systemd-journal-flush.service
entered failed state.

Nov 28 16:22:52 st2 systemd[1]: systemd-timesyncd.service start
operation timed out. Terminating.
Nov 28 16:22:52 st2 systemd[1]: Failed to start Network Time Synchronization.
Nov 28 16:22:52 st2 systemd[1]: Unit systemd-timesyncd.service entered
failed state.
Nov 28 16:22:53 st2 systemd[1]: systemd-timesyncd.service has no
holdoff time, scheduling restart.
Nov 28 16:22:53 st2 systemd[1]: Stopping Network Time Synchronization...
Nov 28 16:22:53 st2 systemd[1]: Starting Network Time Synchronization...

Nov 28 16:23:02 st2 systemd-journal[494]: Journal stopped
Nov 28 16:23:02 st2 systemd-journal[632]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:23:02 st2 systemd-journal[632]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:23:02 st2 systemd-journal[632]: Journal started
Nov 28 16:23:02 st2 systemd[1]: systemd-journald.service watchdog
timeout (limit 1min)!
Nov 28 16:23:02 st2 systemd-journald[494]: Received SIGTERM from PID 1
(systemd).
Nov 28 16:23:02 st2 systemd[1]: Unit systemd-journald.service entered
failed state.
Nov 28 16:23:02 st2 systemd[1]: systemd-journald.service has no
holdoff time, scheduling restart.
Nov 28 16:23:02 st2 systemd[1]: Stopping Journal Service...
Nov 28 16:23:02 st2 systemd[1]: Starting Journal Service...
Nov 28 16:23:02 st2 systemd[1]: Started Journal Service.
Nov 28 16:23:02 st2 systemd[1]: Starting Trigger Flushing of Journal
to Persistent Storage...
Nov 28 16:23:02 st2 systemd[1]: systemd-journal-flush.service: main
process exited, code=exited, status=1/FAILURE
Nov 28 16:23:02 st2 systemd[1]: Failed to start Trigger Flushing of
Journal to Persistent Storage.
Nov 28 16:23:02 st2 systemd[1]: Unit systemd-journal-flush.service
entered failed state.

It repeats itself every minute.

systemctl doesn't work:

# systemctl
Failed to get D-Bus connection: Connection refused

I have 16 lxc containers running on the server:

# lxc-ls -f | grep RUNNING | wc -l
16

and 16 dbus-daemon's (so supposedly one dbus-daemon is missing):

# ps -ef | grep dbus
message+   845     1  0 Feb15 ?        00:09:56 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  1615   579  0 Jun13 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
root      1673 28602  0 16:26 pts/31   00:00:00 grep dbus
systemd+  3761  3461  0 Feb15 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  4635  3436  0 Feb15 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  4767  3527  0 Feb15 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  5344  3597  0 Feb15 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  5714  3664  0 Feb15 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  5793  3750  0 Feb15 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  7856  7198  0 Oct18 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+  9477  8848  0 Oct18 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 10930 10322  0 Oct18 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 13130 10717  0 Jun27 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 13300 11339  0 Apr03 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 19689 19360  0 Jul28 ?        00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 21045 20562  0 Oct19 ?        00:00:01 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation

# ps -ef | grep dbus | wc -l
16

My conjecture is that the first dbus-daemon is of the physical host,
since it has ppid == 1, and user messagebus.

On Nov 21 in the log I can see supposedly restart, starting with:

Nov 21 19:55:27 st2 systemd[320]: systemd 215 running in system mode.
(+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
-SECCOMP -APPARMOR)

https://gist.github.com/x-yuri/8dfe9e561327ad445b1713749cd83252

But I don't understand what triggered it.

Different tools report different time of last reboot:

# last reboot
reboot   system boot  3.16.0-4-amd64   Tue Nov 21 19:55 - 19:56  (00:01)

wtmp begins Thu Nov  2 17:22:02 2017

# who -b
         system boot  2017-11-21 19:55

# journalctl --list-boots
 0 606cc0c448794f2a8573fcdc2ba8d163 Fri 2017-10-13 05:09:18 EEST—Tue
2017-11-28 16:56:18 EET

# uptime
 16:57:21 up 286 days, 13:22,  1 user,  load average: 3.19, 3.32, 3.33

Is there anything I can check? Any suggestions are welcome.

P.S.,

# cat /etc/issue
Debian GNU/Linux 8 \n \l

Regards,
Yuri