[systemd-devel] Cannot use systemctl after heavy swapping
Lennart Poettering
lennart at poettering.net
Wed Jan 28 18:25:26 PST 2015
On Wed, 07.01.15 07:59, Alan Fisher (acf at unixcube.org) wrote:
> Hello!
>
> I seem to have reproduced this issue. After a lot of swapping, systemd
> appeared to have become stuck. Trying to restart services with systemctl
> blocked indefinitely. Strangely, this seemed to be the case even after a
> reboot.
>
> Here is a part of the strace -p 1
>
> recvmsg(16, 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 624419539}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 624668458}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 624919333}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 625167344}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 625417381}) = 0recvmsg(16,
> 0x7fff52622560, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) =
> -1EAGAIN(Resourcetemporarily unavailable) epoll_wait(4, {{EPOLLOUT,
> {u32=3793072544, u64=140341849469344}}}, 29, 0) =
> 1clock_gettime(CLOCK_BOOTTIME, {863156, 625665881}) = 0
>
> systemd --version prints
>
> systemd 215
> +PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP
> -APPARMOR
>
> After a second reboot, the problem seems to have disappeared.
Sorry for the late reply!
Hmm, this looks like an EAGAIN busy loop in PID 1, three questions:
a) That fd 16, do you have any idea what this is? What does
/proc/1/fd/ say about it? If this is a socket, can you check with
lsof with which peer it is talking?
b) any chance you can run "pstack 1" when this happens to get a stack
trace out of PID 1?
c) any chance you can reproduce the issue with a more current systemd version?
Lennart
--
Lennart Poettering, Red Hat
More information about the systemd-devel
mailing list