[systemd-devel] forever loop during garbage collection

Umut Tezduyar Lindskog umut at tezduyar.com
Wed Dec 10 06:22:42 PST 2014


On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (umut at tezduyar.com) wrote:
>
>> Hi,
>>
>> We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
>> Problem happens when systemd runs in sysV compatibility mode (Porky
>> enables this).
>>
>> Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
>> like unit_gc_sweep cannot make a decision about the unit. As a result,
>> it marks the unit with offset_unsure and adds the unit back to gc
>> queue.
>>
>> If I am reading the code correctly recursive unit_gc_sweep will never
>> be able to remove the unit from the gc queue if it is referenced by
>> another unit and if another unit is referenced by the unit.
>>
>> A is referenced by B
>> B is referenced by A
>
> So in this case first A will be processed by the GC sweep, it will
> follow the link to B while setting the state to IN_PATH and invoke the
> GC sweep on that. B will then be set to IN_PATH too. GC sweep now
> follows its link back, and up at A again, but this time return quickly
> because its state is set to IN_PATH. Due to this, it will then set B's
> state to UNSURE, and return to A, which in effect will now be set to
> UNSURE too. Now, we return into GC queue dispatch call, which will
> notice that it is UNSURE and uprgade that to BAD, and kill it because
> there's nothin in the unit's dependency network that is clearly a
> GOOD, and hence should be removed.
>
> The essence of cycle breaking here is really in
> manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
> am not seeing how this could end up in an endless loop hence.

I have debugged it more and as you have said there is no bug in code
but it takes so long to go out of unit_gc_sweep I thought there is a
forever loop.

Attached is my patch on 216 and
https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing
is a part of the log after patch.

It has been 3 hours since I issued "systemctl isolate" and according
to the logs I can see that garbage collection logic is making it's way
back up. I guess it will eventually resolve itself but after so many
hours.

(Search for "-                     -" and it is happening every 300.000 lines)

Problem seemed to be introduced on "95ed329" - Move handling of sysv
initscripts to a generator.

This is totally due to how sysV generator is linking services but I
think slowness on GC can happen on a complex system with many units
linked with each other.

Thoughts?
Umut

>
>>
>> We have this circular referenced by dependency between units and I am
>> quite sure they are due to sysV compatibility.
>>
>> I know that systemd does not allow circular dependency between units
>> (ex, wants, or after) but do we allow circular referenced by
>> dependency? If so, then it is expected that manager_dispatch_gc_queue
>> gets stuck.
>>
>> We can reproduce it on 216/217 when we isolate a target.
>>
>> Note: Line http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875
>> should be before
>> http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872
>> since unit_gc_sweep() sets the u->in_gc_queue = true if it cannot make
>> a decision and we set it back to false.
>
> This is intended. After the sweep returned back to the anchor we can
> make our decision: either add the unit to the cleanup queue in which
> case it should removed from the GC queue, or it is reinstantated as
> a good unit that should continue to exist, in which case it should be
> removed from the GC queue too.
>
> Can't see a bug here...
>
> Can you elaborate on how precisely you are encountering the GC loop?
>
> Lennart
>
> --
> Lennart Poettering, Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Debugging-gc_sweep 1.03.18 PM.patch
Type: application/octet-stream
Size: 3933 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20141210/5218452d/attachment.obj>


More information about the systemd-devel mailing list