[systemd-devel] Possible race condition for setting cgroup sticky bit

Anders Olofsson anders.olofsson at axis.com
Fri Apr 5 13:04:48 PDT 2013


> > I'm seeing a problem with a service sometimes failing to start due to a
> missing cgroup.
> > After some debugging I've made the following observations:
> >
> > After exec_spawn() forks, the child will set the sticky bit for the
> > cgroup (in cg_set_task_access) but sometimes, the cgroup is missing
> > (lstat returns "No such file or directory").
> >
> > The cgroup is always created, but the main process will call cg_trim
> > (from cgroup_bonding_trim <- cgroup_bonding_trim_list <-
> > cgroup_notify_empty <- private_bus_message_filter ...) which will
> > remove the cgroup if the sticky bit isn't set.
> 
> Hmm, cg_trim() will ignore groups with the sticky bit set, and the
> kernel won't allow us removing groups where there's currently a process
> in.

I've dumped data from cg_trim and the sticky bit is not set when this occurs. In fact, the state of the sticky bit as seen by cg_trim seems to be the major difference between a proper boot or a broken one.

> The code dealing with forked off service processes in execute.c looks
> like this: after forking, we first create a group, then add us to it,
> and then set the sticky bit for it. Now, there's a tiny window of
> opportunity there (and we should fix it...) where cg_trim from PID 1
> could run in between which is between creating a group and adding us
> into it. But normally, if that fails then the exection of the servie
> should be aborted right away. But that's not what you are seeing?
> 
> I will now add some code which avoids the race I pointed out, but I am
> not sure that's the same one that you are actually encountering...

The cgroup that fails is named after the services. But the service is configured to use the same cgroup as several other services (ControlGroup= is set in the service file).
In this setup, is the child created in the default cgroup and then moved to the configured one or why is the default named cgroup existing at all and being handled?

I've noticed that there always exist cgroups for all services, regardless if they are overridden to use another.


> A temporary work-around could be to precreate the cgroup dir early and
> set the sticky bit on it, so that systemd won't kill it ever...

Thanks, I'll try that. I guess it can't be added to ExecStartPre of the same service though without risking the same problem.

/Anders


More information about the systemd-devel mailing list