[systemd-bugs] [Bug 63080] New: Race condition setting cgroup sticky bit

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Apr 3 07:21:54 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=63080

          Priority: medium
            Bug ID: 63080
          Assignee: systemd-bugs at lists.freedesktop.org
           Summary: Race condition setting cgroup sticky bit
        QA Contact: systemd-bugs at lists.freedesktop.org
          Severity: major
    Classification: Unclassified
                OS: Linux (All)
          Reporter: Anders.Olofsson at axis.com
          Hardware: Other
            Status: NEW
           Version: unspecified
         Component: general
           Product: systemd

After switching to Linux 3.7, I'm seeing a service sometimes failing to start
due the cgroup not being present.
After some investigation and some added debug prints I see the following
happening:

1. exec_spawn forks to spawn the new process

2. Pid 1 continues to run and enters cg_trim for the cgroup belonging to the
new process, checks for the sticky bit (which isn't set yet) and removes it.
I've followed the call to come from: private_bus_message_filter ->
cgroup_notify_empty -> cgroup_bonding_trim_list -> cgroup_bonding_trim ->
cg_trim

3. Child enters cg_set_task_access where it fails because the cgroup has been
removed

4. The service is failed with the following error:
Failed at step CGROUP spawning /etc/init.d/rc: No such file or directory


Tested and reproduced with systemd 197 and 199.

Happens with Linux 3.7, but not with 3.6 or lower.
This is an embedded system using a local MIPS port for the kernel so it might
be a kernel problem. However, I'm guessing it's just a scheduling change in the
kernel making the parent run before child after fork() which triggers the
problem and not a kernel bug.
We also have an ARM port where we're not seeing the problem, but this might not
be reliable as most tests have been run on the MIPS system.

I can easily reproduce the fault within 5-10 boots. It's always the same
service that fails (a wrapper than runs "/etc/init.d/rc 3" that's used while we
port the rest of the system to systemd).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-bugs/attachments/20130403/e837a988/attachment.html>


More information about the systemd-bugs mailing list