[systemd-devel] [WIP PATCH] Do not realize and migrate cgroups multiple times

Martin Pitt martin.pitt at ubuntu.com
Mon Dec 1 03:06:03 PST 2014


Hello all,

In my efforts to make user LXC containers work I noticed that under a
"real" desktop (not just nspawn with VT login or ssh logins) my
carefully set up cgroups in the non-systemd controllers get reverted.
I. e. I put the session leader (and all other pids) of logind sessions
(/user.slice/user-1000.slice/session-XX.scope) into all cgroup
controllers, but a second after they are all back in the / cgroups (or
sometimes in /user.slice). After some days of debugging (during which
I learned a lot about the innards of systemd :-) I finally found out
why:

During unit cgroup initialization (unit_realize_cgroup), sibling
cgroups are queued instead of initialized immediately.
unit_create_cgroups() makes an attempt to avoid initializing and
migrating a cgroup more than once:

       path = unit_default_cgroup_path(u);
       [...]
       r = hashmap_put(u->manager->cgroup_unit, path, u);
        if (r < 0) {
                log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r));
                return r;
        }

But this doesn't work: "path" always gets allocated freshly in that
function, so the pointer is never in the hashmap and the -EEXISTS
never actually hits. This causes -.slice to be initialized and
recursively migrated a gazillion times, which explains the random
punting of sub-cgroup PIDs back to /.

I fixed this with an explicit hashmap_get() call, which compares
values instead of pointers.

I also added some debugging to that function:

  log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));

(right before the hashmap_put() above), which illustrates the
problem:

systemd[1]: Starting Root Slice.
systemd[1]: unit_create_cgroups -.slice: path= realized 0 hashmap (nil)
systemd[1]: Created slice Root Slice.
[...]
pid1 systemd[1]: unit_create_cgroups session-c1.scope: path=/user.slice/user-1000.slice/session-c1.scope realized 0 hashmap (nil)
systemd[1]: Started Session c1 of user martin.
[... later on when most things have been started ...]
systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
systemd[1]: Failed to realize cgroups for queued unit user.slice: File exists
systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
systemd[1]: Failed to realize cgroups for queued unit grub-common.slice: File exists
systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
systemd[1]: Failed to realize cgroups for queued unit networking.slice: File exists

... and so on, basically once for each .service.

Initializing -.slice so many times is certainly an unintended effect
of the peer cgroup creation. Thus the patch fixes the multiple
initialization/creation, but a proper fix should also quiesce these
error messages. The condition could be checked explicitly, i. e. we
skip the "Failed to realize..." error for EEXISTS, or we generally
tone this down to log_debug. I'm open to suggestions. And of course
the log_debug should be removed; it's nice to illustrate the problem,
but doesn't really belong into production code.

Thanks,

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Do-not-realize-and-migrate-cgroups-multiple-times.patch
Type: text/x-diff
Size: 1748 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20141201/08af6f30/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20141201/08af6f30/attachment.sig>


More information about the systemd-devel mailing list