[systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

Krzysztof Kotlenga k.kotlenga at sims.pl
Tue Sep 22 05:17:28 PDT 2015


Hi.

There were many reports of "Looping too fast. Throttling execution a
little" behavior but they never got anywhere. Not this time. At least
for this particular case.

Initially I was able to reproduce this issue just by calling `systemctl
restart some_specific_service` - then most of further `systemctl`
actions (except status) caused systemd to enter an infinite event loop.

Long story short, I found out that entering some garbage
on /run/systemd/private with netcat breaks systemd equally well.
`systemctl daemon-reexec` fixes it. But before you decide to break
yours PID 1 event loop, I recommend the following gdb script, because
it provides some possibly interesting output:

#v+
b event_prepare
commands 1
    set $size = e->prepare->n_items
    set $enabled_after_disabled = 0
    set $i = 0
    while $i < $size
        set $s = (sd_event_source*)e->prepare->items[$i].data
        if $s->type == SOURCE_IO
            printf "%s e=%d pri=%d pi=%u fd=%d\n", \
                $s->description, $s->enabled, $s->priority, \
                $s->pending_iteration, $s->io.fd
        else
            printf "%s e=%d\n", $s->description, $s->enabled
        end

        if $i++ > 0
            if $s->enabled == SD_EVENT_ON && $previous->enabled == SD_EVENT_OFF
                set $enabled_after_disabled = 1
            end
        end

        set $previous = $s
    end

    if $enabled_after_disabled
        print "XXX shouldn't happen?"
    else
        continue
    end
end
#v-

Now let's follow this. Output shortened for brevity.

1. Attach to systemd and let it run. You will most likely see some
event source, not of particular interest, like this:

bus-input e=1 pri=0 pi=0 fd=12

2.  All is good. Run `nc -U /run/systemd/private`

bus-input e=1 pri=0 pi=0 fd=18
bus-input e=1 pri=0 pi=0 fd=12

3. We are here too. Enter a newline.

bus-input e=1 pri=0 pi=0 fd=12
bus-input e=0 pri=0 pi=15 fd=18

4. This bus-input event source got disabled by source_dispatch, because
bus_socket_auth_verify_server returned -EIO. Now wait, hit Ctrl+C,
generate some events with systemctl - whatever.

bus-input e=0 pri=0 pi=15 fd=18
bus-input e=1 pri=0 pi=0 fd=12
$1 = "XXX shouldn't happen?"

5. The event loop is broken now. It haven't entered an infinite loop
*yet* but calls like `systemctl restart whatever` will cause it. The
reason is that prepare callbacks in event_prepare (sd-event.c) aren't
called anymore - the for loop there breaks on the first disabled source.

The prepare callback of the bus that systemctl talks on (most
importantly sd_bus_get_events) is called only once and sets
POLLIN|POLLOUT flags. That POLLOUT is left there as long as the client
is on the bus and causes an infinite event loop.

When we exit that hanging systemctl, systemd quits looping but the
event loop is still broken. That disabled bus-input from point 4 is
left there forever and prevents prepare callbacks of the enabled event
sources to run.

So, how many bugs are here?

It seems that there are other errors than can happen in sd-bus /
bus-socket, which bus_process_internal (sd-bus.c) doesn't handle and
leaves the disabled event source sitting there forever. Those disabled
event sources are:
- a memory leak?
- very good at breaking event_prepare (sd-event.c).

What's going on with order in that "prepare" Prioq?

This has been debugged on v225, but AFAICS master has the same issues.

-- 
kjk


More information about the systemd-devel mailing list