[systemd-devel] systemctl [en|dis]able weirdness + reload (writes /run/nologin)

Mon Jan 13 07:33:35 PST 2014

'Twas brillig, and Colin Guthrie at 13/01/14 15:08 did gyre and gimble:
> 'Twas brillig, and Colin Guthrie at 13/01/14 11:30 did gyre and gimble:
>> 'Twas brillig, and Colin Guthrie at 13/01/14 11:16 did gyre and gimble:
>>> 2. This is the much odder part of the problem I'm seeing. The call to
>>> daemon_reload() at the end of the enable_unit() seems to trigger some
>>> kind of broken daemon reload that puts things into a bad state,
>>> including a stale /run/nologin file.
>>>
>>> I'm not sure WHY this does this, but it's very reliably reproducible. I
>>> have a native sysvinit script called numlock. All I need to do to
>>> trigger the bad state is "systemctl disable numlock". After the call,
>>> the systemd daemon is reloaded and it goes into this bad state
>>> completely with /run/nologin file.
>>>
>>> If I comment out call or use --no-reload, then all is well. If I call
>>> "systemctl daemon-reload" on it's own, all seems well. It just seems to
>>> be this reload call specifically at the end of enable_unit() that
>>> triggers the bad state.
>>>
>>>
>>>
>>> I'm going to try reverting some of the patches I have applied to see
>>> where I get with things, as I see Zbigniew backed a few out of fedora
>>> due to freeze rules, but I did also see some threads from Zbigniew about
>>> the whole /run/nologin, so I suspect he may be interested in this.
>>
>> I reverted the same patches that were reverted in Fedora so our builds
>> should be quite similar.
>>
>> I really hope fedora has this same issue otherwise my debugging just got
>> more confusing.
>>
>> Zbigniew can you reproduce this on F20?
> 
> It seems to be specifically related to chkconfig. If I shell out instead
> to something different (e.g. "whoami") all runs fine.
> 
> I'm wondering if it's something related to semi-systemd stuff supported
> in our chkconfig... Perhaps our patches are out of date compared to
> fedora...
> 
> Still hunting :)

OK, so it seems that chkconfig will these days notify systemd to reload
itself.

static void reloadSystemd(void) {
    if (systemdActive())
        system("systemctl daemon-reload > /dev/null 2>&1");
}

For whatever reason, doing this in the forked off process AND in systemd
itself leads to some kind of race.

Perhaps this happens if two reload operations come in in very quick
succession, or perhaps the use of system() (and it's subsequent fork) in
chkconfig just somehow messes up our signal handling in systemctl?

Either way, commenting this out in systemctl avoids the problem.

I would suggest three possible fixes:

1. Find out why this is racey and fix it.
2. Add an option to chkconfig to disable the reload.
3. Just drop the reload completely from chkconfig.

I would suspect that the option route is the best way forward but
chkconfig will bail out of unsupported options so we should either use
an ENV var or make sure systemd and chkconfig are updated in lockstep.

Thoughts?

Col

-- 

Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
  Tribalogic Limited http://www.tribalogic.net/
Open Source:
  Mageia Contributor http://www.mageia.org/
  PulseAudio Hacker http://www.pulseaudio.org/
  Trac Hacker http://trac.edgewall.org/