[systemd-devel] systemd "looping too fast" after an automatic attempt to restart anacron

Alberto Bertogli albertito at blitiri.com.ar
Sat Mar 28 03:49:59 PDT 2015


Hi!

I noticed by chance that systemd was using ~15% of a CPU in my laptop,
according to top.

This is on Debian testing, systemd 215-12.

The machine has been up since January, but this started to happen
earlier today, while I was asleep, and it seems to be correlated with an
automatic restart of anacron.

The 3.18.3 kernel was built from vanilla sources and has never had
problems with systemd so far.

I asked on #systemd and they suggested me to post here, following up
with the information requested at
http://lists.freedesktop.org/archives/systemd-devel/2015-February/028541.html
(which seems to be a similar problem, if not the same).

So below I paste some general information that I sent on IRC, the gdb
output as requested in the email from February, and some information about the
suspicious anacron restart.


I hope this helps! I won't restart systemd for a day or two (unless it
gets worse, I can spare 15% of a CPU), so if there's any other
information you need, please let me know.

Thanks!
		Alberto


----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----

top reports systemd (pid/tid 1) consuming ~15% of a CPU all the time.

vmstat shows high CPU usage with almost nothing running (not even a
browser) and a lot of context switches:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 1775176 158244 1267832    0    0     0     0  341 45293  5  3 92  0  0
 0  0      0 1775176 158244 1267832    0    0     0     0  352 50775  5  3 93  0  0
 2  0      0 1775920 158244 1267832    0    0     0    80  341 16583  4  2 93  1  0
 0  0      0 1774796 158244 1267832    0    0     0     0  330 32687  4  2 94  0  0
 0  0      0 1774812 158244 1267836    0    0     0     0  337 50805  5  3 92  0  0


$ tail /var/log/syslog
Mar 28 09:51:14 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:15 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:16 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:17 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:19 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:20 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:21 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:22 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:24 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 09:51:25 hostname systemd[1]: Looping too fast. Throttling execution a little.


An strace shows it quite busy too, doing the same loop over an over. Here's
a snippet:

1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5864380, 698340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}}, 41, 0) = 1
1     09:46:32 clock_gettime(CLOCK_BOOTTIME, {5864326, 657137453}) = 0
1     09:46:32 recvmsg(17, 0x7fff5f588a80, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5856161, 198340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}, {EPOLLIN, {u32=3, u64=3}}}, 41, 0) = 2
1     09:46:32 clock_gettime(CLOCK_BOOTTIME, {5864326, 657236489}) = 0
1     09:46:32 read(12, "\1\0\0\0\0\0\0\0", 8) = 8
1     09:46:32 recvmsg(17, 0x7fff5f588a80, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5864380, 698340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}}, 41, 0) = 1
1     09:46:32 clock_gettime(CLOCK_BOOTTIME, {5864326, 657356895}) = 0
1     09:46:32 recvmsg(17, 0x7fff5f588a80, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5856161, 198340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}, {EPOLLIN, {u32=3, u64=3}}}, 41, 0) = 2
1     09:46:32 clock_gettime(CLOCK_BOOTTIME, {5864326, 657456514}) = 0
1     09:46:32 read(12, "\1\0\0\0\0\0\0\0", 8) = 8
1     09:46:32 recvmsg(17, 0x7fff5f588a80, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5864380, 698340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}}, 41, 0) = 1
1     09:46:32 clock_gettime(CLOCK_BOOTTIME, {5864326, 657575831}) = 0
1     09:46:32 recvmsg(17, 0x7fff5f588a80, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5856161, 198340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}, {EPOLLIN, {u32=3, u64=3}}}, 41, 0) = 2
1     09:46:32 clock_gettime(CLOCK_BOOTTIME, {5864326, 657674270}) = 0
1     09:46:32 read(12, "\1\0\0\0\0\0\0\0", 8) = 8
1     09:46:32 recvmsg(17, 0x7fff5f588a80, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
1     09:46:32 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={5864380, 698340000}}, NULL) = 0
1     09:46:32 epoll_wait(4, {{EPOLLOUT, {u32=2552835776, u64=139752198713024}}}, 41, 0) = 1


The recvmsg() attempts are made on fd 17, which is a UNIX socket:

$ sudo ls -lsa /proc/1/fd/17
0 lrwx------ 1 root root 64 Mar 28 09:49 /proc/1/fd/17 -> socket:[24030250]

$ sudo netstat -nap | grep 24030250
unix  3      [ ]         STREAM     CONNECTED     24030250 1/systemd           /run/systemd/private

$ sudo lsof -n | grep 24030250
systemd       1             root   17u     unix 0xffff8801e18c6900      0t0   24030250 /run/systemd/private

----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----

$ sudo apt-get install systemd-dbg

$ sudo gdb
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".

(gdb) attach 1
Attaching to process 1
Reading symbols from /lib/systemd/systemd...Reading symbols from /usr/lib/debug/.build-id/6f/91c0b132306cee80c188b0d68d1eafa6690850.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpthread-2.19.so...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib/x86_64-linux-gnu/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libselinux.so.1
Reading symbols from /lib/x86_64-linux-gnu/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libpam.so.0
Reading symbols from /lib/x86_64-linux-gnu/libaudit.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libaudit.so.1
Reading symbols from /lib/x86_64-linux-gnu/libcap.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libcap.so.2
Reading symbols from /lib/x86_64-linux-gnu/libkmod.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libkmod.so.2
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libpcre.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libpcre.so.3
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libattr.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libattr.so.1
0x00007f1a96fad6fd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.

(gdb) b source_dispatch
Breakpoint 1 at 0x7f1a97707e30: file ../src/libsystemd/sd-event/sd-event.c, line 1976.

(gdb) c
Continuing.

Breakpoint 1, source_dispatch (s=0x7fff5f589e40) at ../src/libsystemd/sd-event/sd-event.c:1976
1976    ../src/libsystemd/sd-event/sd-event.c: No such file or directory.

(gdb) bt
#0  source_dispatch (s=0x7fff5f589e40) at ../src/libsystemd/sd-event/sd-event.c:1976
#1  0x00007f1a9770922f in sd_event_run (e=0x7f1a97ee69d0, timeout=<optimized out>) at ../src/libsystemd/sd-event/sd-event.c:2314
#2  0x00007f1a977b2211 in manager_loop (m=0x7f1a97ee6370) at ../src/core/manager.c:1912
#3  0x00007f1a9770022b in main (argc=4, argv=<optimized out>) at ../src/core/main.c:1743
(gdb) p *s
$1 = {n_ref = 4, event = 0x100007f1a, userdata = 0x7f1a97ee6da0, prepare = 0x300000001, type = SOURCE_IO, enabled = 0, pending = false, dispatching = false, 
  floating = false, priority = 139752195344224, pending_index = 0, prepare_index = 0, pending_iteration = 0, prepare_iteration = 0, 
  sources_next = 0x7f1a97a0b690 <bus_unit_vtable+3600>, sources_prev = 0x991f03dd9945a800, {io = {callback = 0x1, fd = -1745178096, events = 32538, 
      revents = 1, registered = false}, time = {callback = 0x1, next = 139752195666448, accuracy = 1, earliest_index = 2541510246, latest_index = 32538}, 
    signal = {callback = 0x1, siginfo = {ssi_signo = 2549789200, ssi_errno = 32538, ssi_code = 1, ssi_pid = 0, ssi_uid = 2541510246, ssi_fd = 32538, 
        ssi_tid = 2543892112, ssi_band = 32538, ssi_overrun = 2552908800, ssi_trapno = 32538, ssi_status = 1, ssi_int = 0, ssi_ptr = 139752186927493, 
        ssi_utime = 139752195666448, ssi_stime = 139752187010661, ssi_addr = 32, 
        __pad = "\267\240X_\377\177\000\000@\237X_\377\177\000\000\000\000\000\000\000\000\000\000`\237X_\377\177\000\000@\272\372\227\032\177\000\000\000L*\230\032\177\000"}, sig = 1599643448}, child = {callback = 0x1, siginfo = {si_signo = -1745178096, si_errno = 32538, si_code = 1, _sifields = {_pad = {
            -1753457050, 32538, -1751075184, 32538, -1742058496, 32538, 1, 0, -1753917051, 32538, -1745178096, 32538, -1753833883, 32538, 32, 0, 1599643831, 
            32767, 1599643456, 32767, 0, 0, 1599643488, 32767, -1745176000, 32538, -1742058496, 32538}, _kill = {si_pid = -1753457050, si_uid = 32538}, 
          _timer = {si_tid = -1753457050, si_overrun = 32538, si_sigval = {sival_int = -1751075184, sival_ptr = 0x7f1a97a0b690 <bus_unit_vtable+3600>}}, 
          _rt = {si_pid = -1753457050, si_uid = 32538, si_sigval = {sival_int = -1751075184, sival_ptr = 0x7f1a97a0b690 <bus_unit_vtable+3600>}}, _sigchld = {
            si_pid = -1753457050, si_uid = 32538, si_status = -1751075184, si_utime = 139752198786048, si_stime = 1}, _sigfault = {si_addr = 0x7f1a977c5e66, 
            si_addr_lsb = -18800}, _sigpoll = {si_band = 139752187387494, si_fd = -1751075184}, _sigsys = {_call_addr = 0x7f1a977c5e66, 
            _syscall = -1751075184, _arch = 32538}}}, pid = 1599643448, options = 32767}, defer = {callback = 0x1}, post = {callback = 0x1}, exit = {
      callback = 0x1, prioq_index = 2549789200}}}

(gdb) p s->description
There is no member named description.

(gdb) 

----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----

This is where the problems start, according to syslog. I was asleep, so
there wasn't any user activity then:

Mar 28 07:30:01 hostname CRON[20107]: (root) CMD (test -x /etc/init.d/anacron && /usr/sbin/invoke-rc.d anacron start >/dev/null)
Mar 28 07:30:02 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 07:30:03 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 07:30:04 hostname systemd[1]: Looping too fast. Throttling execution a little.
Mar 28 07:30:05 hostname systemd[1]: Looping too fast. Throttling execution a little.

I have no idea if the anacron restart is related or not, but I think it
may be.

This is from yesterday's logs:

Mar 27 07:30:01 hostname CRON[3821]: (root) CMD (test -x /etc/init.d/anacron && /usr/sbin/invoke-rc.d anacron start >/dev/null)
Mar 27 07:30:01 hostname systemd[1]: Starting Run anacron jobs...
Mar 27 07:30:01 hostname systemd[1]: Started Run anacron jobs.
Mar 27 07:30:01 hostname anacron[3841]: Anacron 2.3 started on 2015-03-27
Mar 27 07:30:01 hostname anacron[3841]: Will run job `cron.daily' in 5 min.
Mar 27 07:30:01 hostname anacron[3841]: Jobs will be executed sequentially
Mar 27 07:35:01 hostname anacron[3841]: Job `cron.daily' started
Mar 27 07:35:01 hostname anacron[3859]: Updated timestamp for job `cron.daily' to 2015-03-27


Note there are no mentions of anacron in today's logs, after the problem
started.

This is a snippet from a current "ps auxf":

root       847  0.0  0.0  27492  1008 ?        Ss   Jan19   0:10 /usr/sbin/cron -f
root     20106  0.0  0.0  63384  2984 ?        S    07:30   0:00  \_ /usr/sbin/CRON -f
root     20107  0.0  0.0   4328   724 ?        Ss   07:30   0:00      \_ /bin/sh -c test -x /etc/init.d/anacron && /usr/sbin/invoke-rc.d anacron start >/dev/nu
root     20108  0.0  0.0   4328  1576 ?        S    07:30   0:00          \_ /bin/sh /usr/sbin/invoke-rc.d anacron start
root     20126  0.0  0.0  22476  2584 ?        S    07:30   0:00              \_ systemctl start anacron.service

And a backtrace of that systemctl start:

$ sudo gdb
[...]
(gdb) attach 20126
[...]
(gdb) bt
#0  0x00007f15253ca579 in __GI_ppoll (fds=fds at entry=0x7fff397c6070, nfds=1, timeout=<optimized out>, sigmask=sigmask at entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:51
#1  0x00007f15253e346e in __ppoll_chk (fds=fds at entry=0x7fff397c6070, nfds=<optimized out>, timeout=<optimized out>, ss=ss at entry=0x0, fdslen=fdslen at entry=16)
    at ppoll_chk.c:28
#2  0x00007f1525b2af2d in __ppoll_alias () at /usr/include/x86_64-linux-gnu/bits/poll2.h:71
#3  bus_poll.lto_priv.155 (bus=0x7f1526b62020, need_more=<optimized out>, timeout_usec=<optimized out>) at ../src/libsystemd/sd-bus/sd-bus.c:2697
#4  0x00007f1525ae775e in bus_process_wait (bus=<optimized out>) at ../src/systemctl/systemctl.c:2336
#5  wait_for_jobs (s=<optimized out>, bus=<optimized out>) at ../src/systemctl/systemctl.c:2383
#6  start_unit.lto_priv.318 (bus=0x7f1526b62020, args=0x7f1526b65be0) at ../src/systemctl/systemctl.c:2756
#7  0x00007f1525ae3b12 in systemctl_main (bus_error=0, argv=0x7fff397c63b8, argc=<optimized out>, bus=0x7f1526b62020) at ../src/systemctl/systemctl.c:6730
#8  main (argc=<optimized out>, argv=0x7fff397c63b8) at ../src/systemctl/systemctl.c:6985

strace shows it's stuck in ppoll():

$ sudo strace -p 20126 -f -F
Process 20126 attached
ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8


And some information about its open file descriptors in case it helps:

$ sudo ls -lsa /proc/20126/fd/
total 0
0 dr-x------ 2 root root  0 Mar 28 09:49 .
0 dr-xr-xr-x 8 root root  0 Mar 28 08:49 ..
0 lr-x------ 1 root root 64 Mar 28 09:49 0 -> pipe:[24028542]
0 l-wx------ 1 root root 64 Mar 28 09:49 1 -> /dev/null
0 lrwx------ 1 root root 64 Mar 28 09:49 2 -> /tmp/tmpfidsUe1 (deleted)
0 lrwx------ 1 root root 64 Mar 28 09:49 3 -> socket:[24028552]

$ sudo lsof -n -p 20126 | grep 24028542
systemctl 20126 root    0r  FIFO                0,8      0t0 24028542 pipe

$ sudo lsof -n -p 20126 | grep 24028552
systemctl 20126 root    3u  unix 0xffff8801e18c5b00      0t0 24028552 @004cc

----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----



More information about the systemd-devel mailing list