[systemd-devel] Possible race condition for setting cgroup sticky bit

Lennart Poettering lennart at poettering.net
Mon Apr 8 05:18:43 PDT 2013


On Fri, 05.04.13 22:04, Anders Olofsson (anders.olofsson at axis.com) wrote:

> 
> > > I'm seeing a problem with a service sometimes failing to start due to a
> > missing cgroup.
> > > After some debugging I've made the following observations:
> > >
> > > After exec_spawn() forks, the child will set the sticky bit for the
> > > cgroup (in cg_set_task_access) but sometimes, the cgroup is missing
> > > (lstat returns "No such file or directory").
> > >
> > > The cgroup is always created, but the main process will call cg_trim
> > > (from cgroup_bonding_trim <- cgroup_bonding_trim_list <-
> > > cgroup_notify_empty <- private_bus_message_filter ...) which will
> > > remove the cgroup if the sticky bit isn't set.
> > 
> > Hmm, cg_trim() will ignore groups with the sticky bit set, and the
> > kernel won't allow us removing groups where there's currently a process
> > in.
> 
> I've dumped data from cg_trim and the sticky bit is not set when this
> occurs. In fact, the state of the sticky bit as seen by cg_trim seems
> to be the major difference between a proper boot or a broken one.

Well, but as long as there is a process in the group the kernel should
already refuse deletion in the group. The sticky bit is hence useful
only for *empty* cgroups, which is what I don't grok here... In your
case the child should have created the group and made itself a member of
it immediately (which a tiny window in between where the group could be
remvoed, but this should result in immediate total failure of the
forking, not just a missing cgroup).

> > The code dealing with forked off service processes in execute.c looks
> > like this: after forking, we first create a group, then add us to it,
> > and then set the sticky bit for it. Now, there's a tiny window of
> > opportunity there (and we should fix it...) where cg_trim from PID 1
> > could run in between which is between creating a group and adding us
> > into it. But normally, if that fails then the exection of the servie
> > should be aborted right away. But that's not what you are seeing?
> > 
> > I will now add some code which avoids the race I pointed out, but I am
> > not sure that's the same one that you are actually encountering...
> 
> The cgroup that fails is named after the services. But the service is
> configured to use the same cgroup as several other services
> (ControlGroup= is set in the service file).  In this setup, is the
> child created in the default cgroup and then moved to the configured
> one or why is the default named cgroup existing at all and being
> handled?

No, if you configured a cgroup name then no "default" cgroup naming
is ever attempted.

Hmm, which hierarchy are you talking of BTW? Note that cgroup
memberships in all heirarchies are pretty much orthogonal on the
kernel-side of things. And systemd will allow you that too. 

> I've noticed that there always exist cgroups for all services,
> regardless if they are overridden to use another.

Really? Maybe in different hierarchies?

It would certainly be a bug if systemd ever creates a cgroup in the "cpu"
hierachy that is not the one you you configured for the "cpu"
hierarchy. 

Any chance you can explain in a bit more detail how your cgroups are set
up and what unit configuration switches you use for that?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list