[systemd-devel] Possible race condition for setting cgroup sticky bit

Lennart Poettering lennart at poettering.net
Fri Apr 5 12:11:22 PDT 2013


On Tue, 26.03.13 13:43, Anders Olofsson (anders.olofsson at axis.com) wrote:

heya, sorry for the delay.

> I'm seeing a problem with a service sometimes failing to start due to a missing cgroup.
> After some debugging I've made the following observations:
> 
> After exec_spawn() forks, the child will set the sticky bit for the
> cgroup (in cg_set_task_access) but sometimes, the cgroup is missing
> (lstat returns "No such file or directory").
> 
> The cgroup is always created, but the main process will call cg_trim
> (from cgroup_bonding_trim <- cgroup_bonding_trim_list <-
> cgroup_notify_empty <- private_bus_message_filter ...) which will
> remove the cgroup if the sticky bit isn't set.

Hmm, cg_trim() will ignore groups with the sticky bit set, and the
kernel won't allow us removing groups where there's currently a process
in.

The code dealing with forked off service processes in execute.c looks
like this: after forking, we first create a group, then add us to it,
and then set the sticky bit for it. Now, there's a tiny window of
opportunity there (and we should fix it...) where cg_trim from PID 1
could run in between which is between creating a group and adding us
into it. But normally, if that fails then the exection of the servie
should be aborted right away. But that's not what you are seeing?

I will now add some code which avoids the race I pointed out, but I am
not sure that's the same one that you are actually encountering...

> This seems to be a race condition.  If the child sets the sticky bit
> first, the parent will leave the cgroup alone. But if the main process
> gets to cg_trim first, the cgroup is removed and the child fails.
> 
> We're using systemd 197. I've tried using 198, but there the child
> dies with SIGSEGV so it's harder to debug what's happening.  The
> problem appeared when we switched from Linux 3.4 to 3.7, but as this
> looks like a race in systemd so I'm not sure if our local kernel tree
> is to blame or if the version bump just changed the timing to trigger
> the race in systemd.
> 
> Since I'm not familiar with the systemd internals and cgroups I would
> appreciate some help to resolve this.
> 
> I can reproduce this pretty easy, usually within 5-10 boots. It's
> always the same service that fails and the services before it never
> fails.

A temporary work-around could be to precreate the cgroup dir early and
set the sticky bit on it, so that systemd won't kill it ever...

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list