[systemd-devel] Possible race condition for setting cgroup sticky bit

Anders Olofsson anders.olofsson at axis.com
Wed Apr 3 07:33:26 PDT 2013


I would really appreciate some help with this from someone who's familiar with the systemd internals.

What mechanism to prevent cg_trim from removing a cgroup before the newly created child has completed cg_set_task_access?

I've created bug 63080 for this as well.

/Anders

> -----Original Message-----
> From: systemd-devel-
> bounces+anders.olofsson=axis.com at lists.freedesktop.org [mailto:systemd-
> devel-bounces+anders.olofsson=axis.com at lists.freedesktop.org] On Behalf
> Of Anders Olofsson
> Sent: den 27 mars 2013 13:58
> To: systemd-devel at lists.freedesktop.org
> Subject: Re: [systemd-devel] Possible race condition for setting cgroup sticky
> bit
> 
> I just tested it with systemd 199 and the problem still occurs.
> 
> However it now fails with " Failed at step CGROUP spawning /etc/init.d/rc:
> No such file or directory" just like in 197 and not with a segfault as I saw (at
> least sometimes) with 198.
> 
> /Anders
> 
> > -----Original Message-----
> > From: systemd-devel-
> > bounces+anders.olofsson=axis.com at lists.freedesktop.org
> [mailto:systemd-
> > devel-bounces+anders.olofsson=axis.com at lists.freedesktop.org] On
> Behalf
> > Of Anders Olofsson
> > Sent: den 26 mars 2013 13:43
> > To: systemd-devel at lists.freedesktop.org
> > Subject: [systemd-devel] Possible race condition for setting cgroup sticky
> bit
> >
> > I'm seeing a problem with a service sometimes failing to start due to a
> > missing cgroup.
> > After some debugging I've made the following observations:
> >
> > After exec_spawn() forks, the child will set the sticky bit for the cgroup (in
> > cg_set_task_access) but sometimes, the cgroup is missing (lstat returns
> "No
> > such file or directory").
> >
> > The cgroup is always created, but the main process will call cg_trim (from
> > cgroup_bonding_trim <- cgroup_bonding_trim_list <-
> cgroup_notify_empty
> > <- private_bus_message_filter ...) which will remove the cgroup if the
> sticky
> > bit isn't set.
> >
> > This seems to be a race condition.
> > If the child sets the sticky bit first, the parent will leave the cgroup alone.
> But
> > if the main process gets to cg_trim first, the cgroup is removed and the
> child
> > fails.
> >
> > We're using systemd 197. I've tried using 198, but there the child dies with
> > SIGSEGV so it's harder to debug what's happening.
> > The problem appeared when we switched from Linux 3.4 to 3.7, but as this
> > looks like a race in systemd so I'm not sure if our local kernel tree is to
> blame
> > or if the version bump just changed the timing to trigger the race in
> systemd.
> >
> > Since I'm not familiar with the systemd internals and cgroups I would
> > appreciate some help to resolve this.
> >
> > I can reproduce this pretty easy, usually within 5-10 boots. It's always the
> > same service that fails and the services before it never fails.
> >
> > /Anders
> > _______________________________________________
> > systemd-devel mailing list
> > systemd-devel at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel


More information about the systemd-devel mailing list