[systemd-devel] systemd-journald watchdog timeout
Lennart Poettering
lennart at poettering.net
Wed Oct 22 07:59:09 PDT 2014
On Wed, 22.10.14 13:10, Damien Robert (damien.olivier.robert+gmane at gmail.com) wrote:
> On one boot I got a watchdog timeout on systemd-journald:
>
> Oct 21 20:08:21 feanor systemd-journal[213]: Permanent journal is using 68.7M (m
> Oct 21 20:08:21 feanor systemd-journal[213]: Time spent on flushing to /var is 2
> Oct 21 20:08:25 feanor kernel: IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not rea
> Oct 21 20:08:25 feanor kernel: wlan0: authenticate with f4:ca:e5:cc:c7:40
> Oct 21 20:08:25 feanor kernel: wlan0: send auth to f4:ca:e5:cc:c7:40 (try 1/3)
> Oct 21 20:08:25 feanor kernel: wlan0: authenticated
> Oct 21 20:08:25 feanor kernel: wlan0: associate with f4:ca:e5:cc:c7:40 (try 1/3)
> Oct 21 20:08:25 feanor kernel: wlan0: RX AssocResp from f4:ca:e5:cc:c7:40 (capab
> Oct 21 20:08:25 feanor kernel: wlan0: associated
> Oct 21 20:08:25 feanor kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link become
> Oct 21 20:08:20 feanor systemd[1]: Started Trigger Flushing of Journal to Persis
> Oct 21 20:08:20 feanor systemd[1]: Started Create Volatile Files and Directories
> Oct 21 20:08:20 feanor systemd[1]: Starting Network Time Synchronization...
> Oct 21 20:08:20 feanor systemd[1]: Starting Update UTMP about System Boot/Shutdo
> Oct 21 20:08:20 feanor systemd[1]: Started Update UTMP about System Boot/Shutdow
> Oct 21 20:08:20 feanor systemd[1]: Started Network Time Synchronization.
> Oct 21 20:08:20 feanor systemd[1]: Starting System Initialization.
> Oct 21 20:08:21 feanor systemd[1]: Reached target System Initialization.
> Oct 21 20:08:21 feanor systemd[1]: Starting sshd.socket.
> Oct 21 20:08:21 feanor systemd[1]: Listening on sshd.socket.
> Oct 21 20:08:21 feanor systemd[1]: Starting D-Bus System Message Bus Socket.
> Oct 21 20:08:25 feanor systemd-logind[516]: New seat seat0.
> Oct 21 20:08:25 feanor systemd-networkd[518]: lo : gained carrier
> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on /dev/inpu
> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on /dev/inpu
> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on /dev/inpu
> Oct 21 20:08:25 feanor systemd[1]: Started Network Service.
> Oct 21 20:08:25 feanor systemd[1]: Starting Network.
> Oct 21 20:08:25 feanor systemd[1]: Reached target Network.
> Oct 21 20:08:25 feanor systemd[1]: Starting Network Name Resolution...
> Oct 21 20:11:20 feanor systemd-journal[574]: Permanent journal is using 75.1M (m
> Oct 21 20:11:20 feanor kernel: e1000e 0000:00:19.0: irq 55 for MSI/MSI-X
> Oct 21 20:11:20 feanor kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not read
> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service watchdog timeout (li
> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service...
> Oct 21 20:11:20 feanor systemd[1]: Started Getty on tty2.
> Oct 21 20:11:20 feanor systemd[1]: lightdm.service start operation timed out. Te
> Oct 21 20:11:20 feanor systemd[1]: systemd-timesyncd.service watchdog timeout (l
> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service stop-sigterm timed o
> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service...
> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service: main process exited
> Oct 21 20:11:20 feanor systemd[1]: Unit systemd-journald.service entered failed
> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service has no holdoff time,
> Oct 21 20:11:20 feanor systemd[1]: Stopping Journal Service...
> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service...
> Oct 21 20:11:20 feanor systemd[1]: Started Journal Service.
>
> As you can see at 20:08:20 things appears to look fine, but everything
> get stuck during 3 minutes until there is a watchdog timeout for the
> journal and it gets restarted. Then the boot finished without any further
> problem. The next time everything worked fine, so I am wondering what could
> have happened?
That's difficult to say just from these logs. Can you reliably
reproduce this? If so, can you attach strace to journald before this
happens and see what it is doing?
What I find interesting is that after the 3min pause suddenly the
network driver allocates irq 55. This kinda makes me wonder if this
might be a kernel/driver problem of some kind, where for some reason
userspace (and hence journald) don't get scheduled?
What distro/version/arch is this?
Lennart
--
Lennart Poettering, Red Hat
More information about the systemd-devel
mailing list