[systemd-devel] Looping too fast. Throttling execution a little

John Morrissey jwm at horde.net
Wed Apr 29 08:15:49 PDT 2015


On Wed, Apr 29, 2015 at 11:46:50AM +0200, Lennart Poettering wrote:
> On Tue, 28.04.15 19:25, John Morrissey (jwm at horde.net) wrote:
> > On 18 Feb 2015, at 18:47, Lennart Poettering <lennart at poettering.net> wrote:
> > > Hmm, this appears to be caused by a timer that is not reset. First the
> > > timer fd is set to the earliest possible trigger, then epoll_wait() is
> > > entered, which immediately quites. Then the tiemrfd elapse counter is
> > > read which is 1.
> > > 
> > > It would be interesting to figure out which timer this is.
> > > 
> > > To make this work, can you reproduce the issue, then use gdb:
> > > 
> > > 1. Type "gdb" to start it
> > > 2. Type "attach 1" to attach to PID 1
> > > 3. Type "b source_dispatch" to set a break point on the source_dispatch function
> > > 4. Type "c" to continue execution
> > > 5. This should then break on the next execution of the source_dispatch function
> > > 6. This should happen immediately, after all PID 1 is busy looping
> > >   around a timer. Use "p s->description" to get a short description
> > >   string for the event that is being dispatched. In fact, please use
> > >   "p *s" to get all data about the event, and paste it here.
> > 
> > I noticed this behavior recently on a Debian jessie system running systemd
> > 215-17. systemd got itself in a loop like the previous reporter's:
[snip]
> > --
> > recvmsg(42, 0x7fff3ea64d00, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
> > timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 1}}, NULL) = 0
> > epoll_wait(4, {{EPOLLIN, {u32=3, u64=3}}}, 36, 0) = 1
> > clock_gettime(CLOCK_BOOTTIME, {1959524, 957887776}) = 0
> > read(3, "\1\0\0\0\0\0\0\0", 8)          = 8
> > recvmsg(42, 0x7fff3ea64d00,
> > MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource
> > temporarily unavailable)
> 
> Any chance you can check what fd 42 refers to? See /proc/1/fd/42 and lsof?

Not easily. The system was basically unusable when it got into this state,
since it was a production machine and being able to start and stop services
is in the critical path.

Thankfully, a reboot fixed it and it hasn't recurred in the couple of days
since, but I thought I'd follow up with the struct output since someone else
reported exactly the same behavior a couple of months ago.

-john


More information about the systemd-devel mailing list