[systemd-devel] [WIP PATCH] Do not realize and migrate cgroups multiple times
Michal Sekletar
msekleta at redhat.com
Tue Dec 2 03:32:34 PST 2014
On Mon, Dec 01, 2014 at 12:06:03PM +0100, Martin Pitt wrote:
> Hello all,
>
> In my efforts to make user LXC containers work I noticed that under a
> "real" desktop (not just nspawn with VT login or ssh logins) my
> carefully set up cgroups in the non-systemd controllers get reverted.
> I. e. I put the session leader (and all other pids) of logind sessions
> (/user.slice/user-1000.slice/session-XX.scope) into all cgroup
> controllers, but a second after they are all back in the / cgroups (or
> sometimes in /user.slice). After some days of debugging (during which
> I learned a lot about the innards of systemd :-) I finally found out
> why:
>
> During unit cgroup initialization (unit_realize_cgroup), sibling
> cgroups are queued instead of initialized immediately.
> unit_create_cgroups() makes an attempt to avoid initializing and
> migrating a cgroup more than once:
>
> path = unit_default_cgroup_path(u);
> [...]
> r = hashmap_put(u->manager->cgroup_unit, path, u);
> if (r < 0) {
> log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r));
> return r;
> }
>
> But this doesn't work: "path" always gets allocated freshly in that
> function, so the pointer is never in the hashmap and the -EEXISTS
> never actually hits. This causes -.slice to be initialized and
> recursively migrated a gazillion times, which explains the random
> punting of sub-cgroup PIDs back to /.
>
> I fixed this with an explicit hashmap_get() call, which compares
> values instead of pointers.
>
> I also added some debugging to that function:
>
> log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));
>
> (right before the hashmap_put() above), which illustrates the
> problem:
>
> systemd[1]: Starting Root Slice.
> systemd[1]: unit_create_cgroups -.slice: path= realized 0 hashmap (nil)
> systemd[1]: Created slice Root Slice.
> [...]
> pid1 systemd[1]: unit_create_cgroups session-c1.scope: path=/user.slice/user-1000.slice/session-c1.scope realized 0 hashmap (nil)
> systemd[1]: Started Session c1 of user martin.
> [... later on when most things have been started ...]
> systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
> systemd[1]: unit_create_cgroups -.slice: cgroup exists already
> systemd[1]: Failed to realize cgroups for queued unit user.slice: File exists
> systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
> systemd[1]: unit_create_cgroups -.slice: cgroup exists already
> systemd[1]: Failed to realize cgroups for queued unit grub-common.slice: File exists
> systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap 0x7f0e14aa4850
> systemd[1]: unit_create_cgroups -.slice: cgroup exists already
> systemd[1]: Failed to realize cgroups for queued unit networking.slice: File exists
>
> ... and so on, basically once for each .service.
>
> Initializing -.slice so many times is certainly an unintended effect
> of the peer cgroup creation. Thus the patch fixes the multiple
> initialization/creation, but a proper fix should also quiesce these
> error messages. The condition could be checked explicitly, i. e. we
> skip the "Failed to realize..." error for EEXISTS, or we generally
> tone this down to log_debug. I'm open to suggestions. And of course
> the log_debug should be removed; it's nice to illustrate the problem,
> but doesn't really belong into production code.
>
> Thanks,
>
> Martin
Also this looks like a possible fix to the problem I tried to describe in,
http://lists.freedesktop.org/archives/systemd-devel/2014-November/025607.html
Michal
>
> --
> Martin Pitt | http://www.piware.de
> Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
> From fd2f8a444c9c644a39dd3b619934e8768e03c2a3 Mon Sep 17 00:00:00 2001
> From: Martin Pitt <martin.pitt at ubuntu.com>
> Date: Mon, 1 Dec 2014 10:50:06 +0100
> Subject: [PATCH] Do not realize and migrate cgroups multiple times
>
> unit_create_cgroups() tries to check if a cgroup already exists. But has the
> destination path is always allocated dynamically as a new string, that pointer
> will never already be in the hashmap, thus hashmap_put() will never actually
> fail with EEXISTS. Thus check for the existance of the cgroup path explicitly.
>
> Before this, "-.slice" got initialized and its child PIDs migrated many times
> through queuing the realization of sibling units; thiss caused any cgroup
> controller layout from sub-cgroups to be reverted and their pids moved back to
> the root cgroup in all controllers (except systemd).
> ---
> src/core/cgroup.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/src/core/cgroup.c b/src/core/cgroup.c
> index 8820bee..3d36080 100644
> --- a/src/core/cgroup.c
> +++ b/src/core/cgroup.c
> @@ -614,6 +614,13 @@ static int unit_create_cgroups(Unit *u, CGroupControllerMask mask) {
> if (!path)
> return log_oom();
>
> + log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path));
> +
> + if (hashmap_get(u->manager->cgroup_unit, path)) {
> + log_error("unit_create_cgroups %s: cgroup %s exists already", u->id, u->cgroup_path);
> + return -EEXIST;
> + }
> +
> r = hashmap_put(u->manager->cgroup_unit, path, u);
> if (r < 0) {
> log_error(r == -EEXIST ? "cgroup %s exists already: %s" : "hashmap_put failed for %s: %s", path, strerror(-r));
> --
> 2.1.3
>
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
More information about the systemd-devel
mailing list