[systemd-devel] Cannot use systemctl after heavy swapping

Lennart Poettering lennart at poettering.net
Wed Jan 28 18:25:26 PST 2015


On Wed, 07.01.15 07:59, Alan Fisher (acf at unixcube.org) wrote:

> Hello!
> 
> I seem to have reproduced this issue. After a lot of swapping, systemd
> appeared to have become stuck. Trying to restart services with systemctl
> blocked indefinitely. Strangely, this seemed to be the case even after a
> reboot.
> 
> Here is a part of the strace -p 1
> 
> recvmsg(16, 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 624419539}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 624668458}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 624919333}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 625167344}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 625417381}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 625665881}) = 0
> 
> systemd --version prints
> 
> systemd 215
> +PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP
> -APPARMOR
> 
> After a second reboot, the problem seems to have disappeared.

Sorry for the late reply!

Hmm, this looks like an EAGAIN busy loop in PID 1, three questions:

a) That fd 16, do you have any idea what this is? What does
   /proc/1/fd/ say about it? If this is a socket, can you check with
   lsof with which peer it is talking?

b) any chance you can run "pstack 1" when this happens to get a stack
   trace out of PID 1?

c) any chance you can reproduce the issue with a more current systemd version?

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list