[systemd-devel] A server says: "System is going down." But never does.
Yuri Kanivetsky
yuri.kanivetsky at gmail.com
Tue Nov 28 15:02:22 UTC 2017
Hi,
This mailing list is the only place where I expect to have some
helpful feedback. But feel free to suggest other places. I'd like to
investigate situation I have now, find out what went wrong and prevent
it from happening again if possible. Your help is appreciated.
Like I said, a server reports that it's going down, when I ssh to it
as root. As a non-root user, it says that and closes the connection.
In the journal I see a lot of this:
Nov 28 16:22:01 st2 systemd-journal[353]: Journal stopped
Nov 28 16:22:01 st2 systemd-journal[494]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:22:01 st2 systemd-journal[494]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:22:01 st2 systemd-journal[494]: Journal started
Nov 28 16:22:01 st2 systemd[1]: systemd-journald.service watchdog
timeout (limit 1min)!
Nov 28 16:22:01 st2 systemd-journald[353]: Received SIGTERM from PID 1
(systemd).
Nov 28 16:22:01 st2 systemd[1]: Unit systemd-journald.service entered
failed state.
Nov 28 16:22:01 st2 systemd[1]: systemd-journald.service has no
holdoff time, scheduling restart.
Nov 28 16:22:01 st2 systemd[1]: Stopping Journal Service...
Nov 28 16:22:01 st2 systemd[1]: Starting Journal Service...
Nov 28 16:22:01 st2 systemd[1]: Started Journal Service.
Nov 28 16:22:01 st2 systemd[1]: Starting Trigger Flushing of Journal
to Persistent Storage...
Nov 28 16:22:01 st2 systemd[1]: systemd-journal-flush.service: main
process exited, code=exited, status=1/FAILURE
Nov 28 16:22:01 st2 systemd[1]: Failed to start Trigger Flushing of
Journal to Persistent Storage.
Nov 28 16:22:01 st2 systemd[1]: Unit systemd-journal-flush.service
entered failed state.
Nov 28 16:22:52 st2 systemd[1]: systemd-timesyncd.service start
operation timed out. Terminating.
Nov 28 16:22:52 st2 systemd[1]: Failed to start Network Time Synchronization.
Nov 28 16:22:52 st2 systemd[1]: Unit systemd-timesyncd.service entered
failed state.
Nov 28 16:22:53 st2 systemd[1]: systemd-timesyncd.service has no
holdoff time, scheduling restart.
Nov 28 16:22:53 st2 systemd[1]: Stopping Network Time Synchronization...
Nov 28 16:22:53 st2 systemd[1]: Starting Network Time Synchronization...
Nov 28 16:23:02 st2 systemd-journal[494]: Journal stopped
Nov 28 16:23:02 st2 systemd-journal[632]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:23:02 st2 systemd-journal[632]: Runtime journal is using
624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G
available → current limit 642.1M).
Nov 28 16:23:02 st2 systemd-journal[632]: Journal started
Nov 28 16:23:02 st2 systemd[1]: systemd-journald.service watchdog
timeout (limit 1min)!
Nov 28 16:23:02 st2 systemd-journald[494]: Received SIGTERM from PID 1
(systemd).
Nov 28 16:23:02 st2 systemd[1]: Unit systemd-journald.service entered
failed state.
Nov 28 16:23:02 st2 systemd[1]: systemd-journald.service has no
holdoff time, scheduling restart.
Nov 28 16:23:02 st2 systemd[1]: Stopping Journal Service...
Nov 28 16:23:02 st2 systemd[1]: Starting Journal Service...
Nov 28 16:23:02 st2 systemd[1]: Started Journal Service.
Nov 28 16:23:02 st2 systemd[1]: Starting Trigger Flushing of Journal
to Persistent Storage...
Nov 28 16:23:02 st2 systemd[1]: systemd-journal-flush.service: main
process exited, code=exited, status=1/FAILURE
Nov 28 16:23:02 st2 systemd[1]: Failed to start Trigger Flushing of
Journal to Persistent Storage.
Nov 28 16:23:02 st2 systemd[1]: Unit systemd-journal-flush.service
entered failed state.
It repeats itself every minute.
systemctl doesn't work:
# systemctl
Failed to get D-Bus connection: Connection refused
I have 16 lxc containers running on the server:
# lxc-ls -f | grep RUNNING | wc -l
16
and 16 dbus-daemon's (so supposedly one dbus-daemon is missing):
# ps -ef | grep dbus
message+ 845 1 0 Feb15 ? 00:09:56 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 1615 579 0 Jun13 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
root 1673 28602 0 16:26 pts/31 00:00:00 grep dbus
systemd+ 3761 3461 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 4635 3436 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 4767 3527 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 5344 3597 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 5714 3664 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 5793 3750 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 7856 7198 0 Oct18 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 9477 8848 0 Oct18 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 10930 10322 0 Oct18 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 13130 10717 0 Jun27 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 13300 11339 0 Apr03 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 19689 19360 0 Jul28 ? 00:00:00 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
systemd+ 21045 20562 0 Oct19 ? 00:00:01 /usr/bin/dbus-daemon
--system --address=systemd: --nofork --nopidfile --systemd-activation
# ps -ef | grep dbus | wc -l
16
My conjecture is that the first dbus-daemon is of the physical host,
since it has ppid == 1, and user messagebus.
On Nov 21 in the log I can see supposedly restart, starting with:
Nov 21 19:55:27 st2 systemd[320]: systemd 215 running in system mode.
(+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
-SECCOMP -APPARMOR)
https://gist.github.com/x-yuri/8dfe9e561327ad445b1713749cd83252
But I don't understand what triggered it.
Different tools report different time of last reboot:
# last reboot
reboot system boot 3.16.0-4-amd64 Tue Nov 21 19:55 - 19:56 (00:01)
wtmp begins Thu Nov 2 17:22:02 2017
# who -b
system boot 2017-11-21 19:55
# journalctl --list-boots
0 606cc0c448794f2a8573fcdc2ba8d163 Fri 2017-10-13 05:09:18 EEST—Tue
2017-11-28 16:56:18 EET
# uptime
16:57:21 up 286 days, 13:22, 1 user, load average: 3.19, 3.32, 3.33
Is there anything I can check? Any suggestions are welcome.
P.S.,
# cat /etc/issue
Debian GNU/Linux 8 \n \l
Regards,
Yuri
More information about the systemd-devel
mailing list