[systemd-devel] Possible race condition for setting cgroup sticky bit
Anders Olofsson
anders.olofsson at axis.com
Wed Mar 27 05:58:18 PDT 2013
I just tested it with systemd 199 and the problem still occurs.
However it now fails with " Failed at step CGROUP spawning /etc/init.d/rc: No such file or directory" just like in 197 and not with a segfault as I saw (at least sometimes) with 198.
/Anders
> -----Original Message-----
> From: systemd-devel-
> bounces+anders.olofsson=axis.com at lists.freedesktop.org [mailto:systemd-
> devel-bounces+anders.olofsson=axis.com at lists.freedesktop.org] On Behalf
> Of Anders Olofsson
> Sent: den 26 mars 2013 13:43
> To: systemd-devel at lists.freedesktop.org
> Subject: [systemd-devel] Possible race condition for setting cgroup sticky bit
>
> I'm seeing a problem with a service sometimes failing to start due to a
> missing cgroup.
> After some debugging I've made the following observations:
>
> After exec_spawn() forks, the child will set the sticky bit for the cgroup (in
> cg_set_task_access) but sometimes, the cgroup is missing (lstat returns "No
> such file or directory").
>
> The cgroup is always created, but the main process will call cg_trim (from
> cgroup_bonding_trim <- cgroup_bonding_trim_list <- cgroup_notify_empty
> <- private_bus_message_filter ...) which will remove the cgroup if the sticky
> bit isn't set.
>
> This seems to be a race condition.
> If the child sets the sticky bit first, the parent will leave the cgroup alone. But
> if the main process gets to cg_trim first, the cgroup is removed and the child
> fails.
>
> We're using systemd 197. I've tried using 198, but there the child dies with
> SIGSEGV so it's harder to debug what's happening.
> The problem appeared when we switched from Linux 3.4 to 3.7, but as this
> looks like a race in systemd so I'm not sure if our local kernel tree is to blame
> or if the version bump just changed the timing to trigger the race in systemd.
>
> Since I'm not familiar with the systemd internals and cgroups I would
> appreciate some help to resolve this.
>
> I can reproduce this pretty easy, usually within 5-10 boots. It's always the
> same service that fails and the services before it never fails.
>
> /Anders
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
More information about the systemd-devel
mailing list