[systemd-devel] Possible race condition for setting cgroup sticky bit

Mon Apr 8 09:51:07 PDT 2013

On Mon, 08.04.13 16:57, Anders Olofsson (anders.olofsson at axis.com) wrote:

> Ok, let's see if I can explain what we've done here.
> 
> To introduce systemd in our system, we've started with just wrapping rc and all the old initscripts so we can get systemd running first and then afterwards start converting to native services.
> The boot is basically two services: legacy_rcS.service (which runs "/etc/init.d/rc S") and legacy_rc3.service (which runs "/etc/init.d/rc 3"). There is also a legacy_rc4.service (wanted by upgrade.target) used for firmware upgrads and similar special system actions.
> Journal, udev and syslog runs as separate services outside these wrappers and the idea is to migrate boot script to services a few at a time until the legacy wrappers are empty and can be dropped.
> 
> The following is the service file for the runlevel 3 wrapper:
> [Unit]
> Description=Legacy runlevel 3
> Wants=legacy_rcS.service
> After=legacy_rcS.service
> Conflicts=legacy_rc4.service
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/etc/init.d/rc 3
> StandardOutput=tty
> Environment=RUNLEVEL=3
> Environment=PREVLEVEL=X
> ControlGroup=systemd:/system/legacy_rc.service
> ControlGroupPersistent=true
> KillMode=none
> 
> The same cgroup is configured for all the legacy services (rcS, rc3
> and rc4).

Ah, that's the issue. You can't really do manipulations like that in
systemd's own hierarchy: sticking multiple services in the same cgroup
in the name=systemd hierarchy will break things heavily (and I am
surprised you didn't run into that pronlem earlier). It is OK to stick
muliple services into the same group on all other hierachies, but not in
systemd's own. In fact, you shouldn't really fiddle with systemd private
hierachy at all. We need that to keep track of our own service state
(i.e. for checking whether a service is still running), we will use it
to kill services and so on, we take the liberty to remove groups in that
hierarchy as we see fit... and for that we need to keep the groups
separate.

This is actually documented in the man pages:

"It is not recommended to manipulate the service control group path in
the systemd named hierarchy." (see systemd.exec(5) the part about
ControlGroup=)

I have now changed the man page to be a bit stronger here, and say that
you might get undefined behaviour if you change systemd's own hierarchy.

> 
> When looking in sysfs, I see cgroups for all the legacy services, even though the rcS and rc3 services use the configured generic cgroup:
> The following is from a working system, when a failure happens, rc and rcS are present, but not rc3:
> # ls -d /sys/fs/cgroup/systemd/system/legacy*
> /sys/fs/cgroup/systemd/system/legacy_rc.service
> /sys/fs/cgroup/systemd/system/legacy_rc3.service
> /sys/fs/cgroup/systemd/system/legacy_rcS.service
> 
> This was what I meant with "cgroups for all services" exist even
> though it has been overridden.  Without the ControlGroup= setting,
> legacy_rc3 and legacy_rcS would have use the cgroups with the same
> names. But since we've specify that we want a different name, I'm
> wondering why I still see the default names that we don't want to use.

Well, the systemd hiearchy is special. We have special semantics for it,
and you shouldn't alter it. You are free to rearrange cgroups in all
other hierarchies and drop as many services in the same cgroup as you
wish for those, but not for systemd's own name=systemd hierarchy.

I hope this makes sense,

Lennart

-- 
Lennart Poettering - Red Hat, Inc.