[systemd-devel] forever loop during garbage collection

Lennart Poettering lennart at poettering.net
Mon Dec 8 11:09:26 PST 2014


On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (umut at tezduyar.com) wrote:

> Hi,
> 
> We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
> Problem happens when systemd runs in sysV compatibility mode (Porky
> enables this).
> 
> Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
> like unit_gc_sweep cannot make a decision about the unit. As a result,
> it marks the unit with offset_unsure and adds the unit back to gc
> queue.
> 
> If I am reading the code correctly recursive unit_gc_sweep will never
> be able to remove the unit from the gc queue if it is referenced by
> another unit and if another unit is referenced by the unit.
> 
> A is referenced by B
> B is referenced by A

So in this case first A will be processed by the GC sweep, it will
follow the link to B while setting the state to IN_PATH and invoke the
GC sweep on that. B will then be set to IN_PATH too. GC sweep now
follows its link back, and up at A again, but this time return quickly
because its state is set to IN_PATH. Due to this, it will then set B's
state to UNSURE, and return to A, which in effect will now be set to
UNSURE too. Now, we return into GC queue dispatch call, which will
notice that it is UNSURE and uprgade that to BAD, and kill it because
there's nothin in the unit's dependency network that is clearly a
GOOD, and hence should be removed.

The essence of cycle breaking here is really in
manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
am not seeing how this could end up in an endless loop hence. 

> 
> We have this circular referenced by dependency between units and I am
> quite sure they are due to sysV compatibility.
> 
> I know that systemd does not allow circular dependency between units
> (ex, wants, or after) but do we allow circular referenced by
> dependency? If so, then it is expected that manager_dispatch_gc_queue
> gets stuck.
> 
> We can reproduce it on 216/217 when we isolate a target.
> 
> Note: Line http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875
> should be before
> http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872
> since unit_gc_sweep() sets the u->in_gc_queue = true if it cannot make
> a decision and we set it back to false.

This is intended. After the sweep returned back to the anchor we can
make our decision: either add the unit to the cleanup queue in which
case it should removed from the GC queue, or it is reinstantated as
a good unit that should continue to exist, in which case it should be
removed from the GC queue too.

Can't see a bug here...

Can you elaborate on how precisely you are encountering the GC loop?

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list