[systemd-devel] [WIP PATCH] Do not realize and migrate cgroups multiple times

Tue Dec 2 03:32:34 PST 2014

On Mon, Dec 01, 2014 at 12:06:03PM +0100, Martin Pitt wrote:
> Hello all,
> 
> In my efforts to make user LXC containers work I noticed that under a
> "real" desktop (not just nspawn with VT login or ssh logins) my
> carefully set up cgroups in the non-systemd controllers get reverted.
> I. e. I put the session leader (and all other pids) of logind sessions
> (/user.slice/user-1000.slice/session-XX.scope) into all cgroup
> controllers, but a second after they are all back in the / cgroups (or
> sometimes in /user.slice). After some days of debugging (during which
> I learned a lot about the innards of systemd :-) I finally found out
> why:
> 
> During unit cgroup initialization (unit_realize_cgroup), sibling
> cgroups are queued instead of initialized immediately.
> unit_create_cgroups() makes an attempt to avoid initializing and
> migrating a cgroup more than once:
> 
>        path = unit_default_cgroup_path(u);
>        [...]
>        r = hashmap_put(u->manager->cgroup_unit, path, u);
>         if (r < 0) {
>                 log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r));
>                 return r;
>         }
> 
> But this doesn't work: "path" always gets allocated freshly in that
> function, so the pointer is never in the hashmap and the -EEXISTS
> never actually hits. This causes -.slice to be initialized and
> recursively migrated a gazillion times, which explains the random
> punting of sub-cgroup PIDs back to /.
> 
> I fixed this with an explicit hashmap_get() call, which compares
> values instead of pointers.
> 
> I also added some debugging to that function:
> 
>   log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));
> 
> (right before the hashmap_put() above), which illustrates the
> problem:
> 
> systemd[1]: Starting Root Slice.
> systemd[1]: unit_create_cgroups -.slice: path= realized 0 hashmap (nil)
> systemd[1]: Created slice Root Slice.
> [...]
> pid1 systemd[1]: unit_create_cgroups session-c1.scope: path=/user.slice/user-1000.slice/session-c1.scope realized 0 hashmap (nil)
> systemd[1]: Started Session c1 of user martin.
> [... later on when most things have been started ...]
> systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
> systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
> systemd[1]: Failed to realize cgroups for queued unit user.slice: File exists
> systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
> systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
> systemd[1]: Failed to realize cgroups for queued unit grub-common.slice: File exists
> systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
> systemd[1]: unit_create_cgroups -.slice: cgroup  exists already
> systemd[1]: Failed to realize cgroups for queued unit networking.slice: File exists
> 
> ... and so on, basically once for each .service.
> 
> Initializing -.slice so many times is certainly an unintended effect
> of the peer cgroup creation. Thus the patch fixes the multiple
> initialization/creation, but a proper fix should also quiesce these
> error messages. The condition could be checked explicitly, i. e. we
> skip the "Failed to realize..." error for EEXISTS, or we generally
> tone this down to log_debug. I'm open to suggestions. And of course
> the log_debug should be removed; it's nice to illustrate the problem,
> but doesn't really belong into production code.
> 
> Thanks,
> 
> Martin

Also this looks like a possible fix to the problem I tried to describe in,

http://lists.freedesktop.org/archives/systemd-devel/2014-November/025607.html

Michal

> 
> -- 
> Martin Pitt                        | http://www.piware.de
> Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

> From fd2f8a444c9c644a39dd3b619934e8768e03c2a3 Mon Sep 17 00:00:00 2001
> From: Martin Pitt <martin.pitt at ubuntu.com>
> Date: Mon, 1 Dec 2014 10:50:06 +0100
> Subject: [PATCH] Do not realize and migrate cgroups multiple times
> 
> unit_create_cgroups() tries to check if a cgroup already exists. But has the
> destination path is always allocated dynamically as a new string, that pointer
> will never already be in the hashmap, thus hashmap_put() will never actually
> fail with EEXISTS. Thus check for the existance of the cgroup path explicitly.
> 
> Before this, "-.slice" got initialized and its child PIDs migrated many times
> through queuing the realization of sibling units; thiss caused any cgroup
> controller layout from sub-cgroups to be reverted and their pids moved back to
> the root cgroup in all controllers (except systemd).
> ---
>  src/core/cgroup.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/core/cgroup.c b/src/core/cgroup.c
> index 8820bee..3d36080 100644
> --- a/src/core/cgroup.c
> +++ b/src/core/cgroup.c
> @@ -614,6 +614,13 @@ static int unit_create_cgroups(Unit *u, CGroupControllerMask mask) {
>          if (!path)
>                  return log_oom();
>  
> +        log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));
> +
> +        if (hashmap_get(u->manager->cgroup_unit, path)) {
> +                log_error("unit_create_cgroups %s: cgroup %s exists already", u->id, u->cgroup_path);
> +                return -EEXIST;
> +        }
> +
>          r = hashmap_put(u->manager->cgroup_unit, path, u);
>          if (r < 0) {
>                  log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r));
> -- 
> 2.1.3
> 

> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel