[systemd-devel] systemd-journald watchdog timeout

Umut Tezduyar Lindskog umut at tezduyar.com
Wed Oct 22 09:35:41 PDT 2014


On Wed, Oct 22, 2014 at 4:59 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Wed, 22.10.14 13:10, Damien Robert (damien.olivier.robert+gmane at gmail.com) wrote:
>
>> On one boot I got a watchdog timeout on systemd-journald:
>>
>> Oct 21 20:08:21 feanor systemd-journal[213]: Permanent journal is using 68.7M (m
>> Oct 21 20:08:21 feanor systemd-journal[213]: Time spent on flushing to /var is 2
>> Oct 21 20:08:25 feanor kernel: IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not rea
>> Oct 21 20:08:25 feanor kernel: wlan0: authenticate with f4:ca:e5:cc:c7:40
>> Oct 21 20:08:25 feanor kernel: wlan0: send auth to f4:ca:e5:cc:c7:40 (try 1/3)
>> Oct 21 20:08:25 feanor kernel: wlan0: authenticated
>> Oct 21 20:08:25 feanor kernel: wlan0: associate with f4:ca:e5:cc:c7:40 (try 1/3)
>> Oct 21 20:08:25 feanor kernel: wlan0: RX AssocResp from f4:ca:e5:cc:c7:40 (capab
>> Oct 21 20:08:25 feanor kernel: wlan0: associated
>> Oct 21 20:08:25 feanor kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link become
>> Oct 21 20:08:20 feanor systemd[1]: Started Trigger Flushing of Journal to Persis
>> Oct 21 20:08:20 feanor systemd[1]: Started Create Volatile Files and Directories
>> Oct 21 20:08:20 feanor systemd[1]: Starting Network Time Synchronization...
>> Oct 21 20:08:20 feanor systemd[1]: Starting Update UTMP about System Boot/Shutdo
>> Oct 21 20:08:20 feanor systemd[1]: Started Update UTMP about System Boot/Shutdow
>> Oct 21 20:08:20 feanor systemd[1]: Started Network Time Synchronization.
>> Oct 21 20:08:20 feanor systemd[1]: Starting System Initialization.
>> Oct 21 20:08:21 feanor systemd[1]: Reached target System Initialization.
>> Oct 21 20:08:21 feanor systemd[1]: Starting sshd.socket.
>> Oct 21 20:08:21 feanor systemd[1]: Listening on sshd.socket.
>> Oct 21 20:08:21 feanor systemd[1]: Starting D-Bus System Message Bus Socket.
>> Oct 21 20:08:25 feanor systemd-logind[516]: New seat seat0.
>> Oct 21 20:08:25 feanor systemd-networkd[518]: lo              : gained carrier
>> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on /dev/inpu
>> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on /dev/inpu
>> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on /dev/inpu
>> Oct 21 20:08:25 feanor systemd[1]: Started Network Service.
>> Oct 21 20:08:25 feanor systemd[1]: Starting Network.
>> Oct 21 20:08:25 feanor systemd[1]: Reached target Network.
>> Oct 21 20:08:25 feanor systemd[1]: Starting Network Name Resolution...
>> Oct 21 20:11:20 feanor systemd-journal[574]: Permanent journal is using 75.1M (m
>> Oct 21 20:11:20 feanor kernel: e1000e 0000:00:19.0: irq 55 for MSI/MSI-X
>> Oct 21 20:11:20 feanor kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not read
>> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service watchdog timeout (li
>> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service...
>> Oct 21 20:11:20 feanor systemd[1]: Started Getty on tty2.
>> Oct 21 20:11:20 feanor systemd[1]: lightdm.service start operation timed out. Te
>> Oct 21 20:11:20 feanor systemd[1]: systemd-timesyncd.service watchdog timeout (l
>> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service stop-sigterm timed o
>> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service...
>> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service: main process exited
>> Oct 21 20:11:20 feanor systemd[1]: Unit systemd-journald.service entered failed
>> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service has no holdoff time,
>> Oct 21 20:11:20 feanor systemd[1]: Stopping Journal Service...
>> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service...
>> Oct 21 20:11:20 feanor systemd[1]: Started Journal Service.
>>
>> As you can see at 20:08:20 things appears to look fine, but everything
>> get stuck during 3 minutes until there is a watchdog timeout for the
>> journal and it gets restarted. Then the boot finished without any further
>> problem. The next time everything worked fine, so I am wondering what could
>> have happened?
>
> That's difficult to say just from these logs. Can you reliably
> reproduce this? If so, can you attach strace to journald before this
> happens and see what it is doing?

I am wondering if we should send another signal that generates core
when the watchdog is not fed. It would be nice to get the coredump of
a frozen process.

Umut

>
> What I find interesting is that after the 3min pause suddenly the
> network driver allocates irq 55. This kinda makes me wonder if this
> might be a kernel/driver problem of some kind, where for some reason
> userspace (and hence journald) don't get scheduled?
>
> What distro/version/arch is this?
>
> Lennart
>
> --
> Lennart Poettering, Red Hat
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel


More information about the systemd-devel mailing list