[systemd-devel] forever loop during garbage collection
Umut Tezduyar Lindskog
umut at tezduyar.com
Mon Dec 29 07:26:00 PST 2014
On Wednesday, December 10, 2014, Umut Tezduyar Lindskog <umut at tezduyar.com>
> On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering
> > On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (umut at tezduyar.com
> >> Hi,
> >> We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
> >> Problem happens when systemd runs in sysV compatibility mode (Porky
> >> enables this).
> >> Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
> >> like unit_gc_sweep cannot make a decision about the unit. As a result,
> >> it marks the unit with offset_unsure and adds the unit back to gc
> >> queue.
> >> If I am reading the code correctly recursive unit_gc_sweep will never
> >> be able to remove the unit from the gc queue if it is referenced by
> >> another unit and if another unit is referenced by the unit.
> >> A is referenced by B
> >> B is referenced by A
> > So in this case first A will be processed by the GC sweep, it will
> > follow the link to B while setting the state to IN_PATH and invoke the
> > GC sweep on that. B will then be set to IN_PATH too. GC sweep now
> > follows its link back, and up at A again, but this time return quickly
> > because its state is set to IN_PATH. Due to this, it will then set B's
> > state to UNSURE, and return to A, which in effect will now be set to
> > UNSURE too. Now, we return into GC queue dispatch call, which will
> > notice that it is UNSURE and uprgade that to BAD, and kill it because
> > there's nothin in the unit's dependency network that is clearly a
> > GOOD, and hence should be removed.
> > The essence of cycle breaking here is really in
> > manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
> > am not seeing how this could end up in an endless loop hence.
> I have debugged it more and as you have said there is no bug in code
> but it takes so long to go out of unit_gc_sweep I thought there is a
> forever loop.
> Attached is my patch on 216 and
> is a part of the log after patch.
> It has been 3 hours since I issued "systemctl isolate" and according
> to the logs I can see that garbage collection logic is making it's way
> back up. I guess it will eventually resolve itself but after so many
> (Search for "- -" and it is happening every 300.000
> Problem seemed to be introduced on "95ed329" - Move handling of sysv
> initscripts to a generator.
> This is totally due to how sysV generator is linking services but I
> think slowness on GC can happen on a complex system with many units
> linked with each other.
> >> We have this circular referenced by dependency between units and I am
> >> quite sure they are due to sysV compatibility.
> >> I know that systemd does not allow circular dependency between units
> >> (ex, wants, or after) but do we allow circular referenced by
> >> dependency? If so, then it is expected that manager_dispatch_gc_queue
> >> gets stuck.
> >> We can reproduce it on 216/217 when we isolate a target.
> >> Note: Line
> >> should be before
> >> since unit_gc_sweep() sets the u->in_gc_queue = true if it cannot make
> >> a decision and we set it back to false.
> > This is intended. After the sweep returned back to the anchor we can
> > make our decision: either add the unit to the cleanup queue in which
> > case it should removed from the GC queue, or it is reinstantated as
> > a good unit that should continue to exist, in which case it should be
> > removed from the GC queue too.
> > Can't see a bug here...
> > Can you elaborate on how precisely you are encountering the GC loop?
> > Lennart
> > --
> > Lennart Poettering, Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the systemd-devel