[systemd-devel] [PATCHSET RE-RESEND] update unified hierarchy support
Zbigniew Jędrzejewski-Szmek
zbyszek at in.waw.pl
Sat Mar 26 15:41:59 UTC 2016
Hi,
we've moved the development of systemd to github, it's best
to open pull requests there.
I uploaded your patches as https://github.com/systemd/systemd/pull/2902.
Let's discuss things there.
Zbyszek
On Fri, Mar 25, 2016 at 02:34:00PM -0400, Tejun Heo wrote:
> (sorry, of course forgot to attach the patches)
> (bounced for not being subscribed, resending...)
>
> Hello,
>
> Unified hierarchy is available on the 4.5 kernel but there have been
> several updates.
>
> 1. The __DEVEL__sane_behavior flag is gone. Unified hierarchy is now
> available as "cgroup2" filesystem type with its own super magic
> number.
>
> 2. "cgroup.populated" file is replaced with "populated" field of
> "cgroup.events" file.
>
> 3. A zombie task remains associated with the cgroup it was associated
> with at the time of death instead of being moved immediately to
> root. This means that pid to unit lookup may return a slice if the
> session or service unit the pid belonged to is already gone.
>
> Three patches are attached addressing each of the above.
>
> Thanks!
>
> --
> tejun
> From 278a39f0a8fa34cd899c6a08e76626c987a4713e Mon Sep 17 00:00:00 2001
> From: Tejun Heo <htejun at fb.com>
> Date: Fri, 25 Mar 2016 11:38:50 -0400
> Subject: [PATCH 1/3] core: update unified hierarchy support
>
> Unified hierarchy is official as of Linux v4.5 and now available through a new
> filesystem type, cgroup2, with its own super magic. Update mount logic
> accordingly.
>
> Signed-off-by: Tejun Heo <htejun at fb.com>
> ---
> src/basic/cgroup-util.c | 2 +-
> src/basic/missing.h | 4 ++++
> src/core/mount-setup.c | 2 +-
> 3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/basic/cgroup-util.c b/src/basic/cgroup-util.c
> index 56c1fca..5124b5b 100644
> --- a/src/basic/cgroup-util.c
> +++ b/src/basic/cgroup-util.c
> @@ -2129,7 +2129,7 @@ int cg_unified(void) {
> if (statfs("/sys/fs/cgroup/", &fs) < 0)
> return -errno;
>
> - if (F_TYPE_EQUAL(fs.f_type, CGROUP_SUPER_MAGIC))
> + if (F_TYPE_EQUAL(fs.f_type, CGROUP2_SUPER_MAGIC))
> unified_cache = true;
> else if (F_TYPE_EQUAL(fs.f_type, TMPFS_MAGIC))
> unified_cache = false;
> diff --git a/src/basic/missing.h b/src/basic/missing.h
> index 034e334..66cd592 100644
> --- a/src/basic/missing.h
> +++ b/src/basic/missing.h
> @@ -437,6 +437,10 @@ struct btrfs_ioctl_quota_ctl_args {
> #define CGROUP_SUPER_MAGIC 0x27e0eb
> #endif
>
> +#ifndef CGROUP2_SUPER_MAGIC
> +#define CGROUP2_SUPER_MAGIC 0x63677270
> +#endif
> +
> #ifndef TMPFS_MAGIC
> #define TMPFS_MAGIC 0x01021994
> #endif
> diff --git a/src/core/mount-setup.c b/src/core/mount-setup.c
> index de1a361..32fe51c 100644
> --- a/src/core/mount-setup.c
> +++ b/src/core/mount-setup.c
> @@ -94,7 +94,7 @@ static const MountPoint mount_table[] = {
> #endif
> { "tmpfs", "/run", "tmpfs", "mode=755", MS_NOSUID|MS_NODEV|MS_STRICTATIME,
> NULL, MNT_FATAL|MNT_IN_CONTAINER },
> - { "cgroup", "/sys/fs/cgroup", "cgroup", "__DEVEL__sane_behavior", MS_NOSUID|MS_NOEXEC|MS_NODEV,
> + { "cgroup", "/sys/fs/cgroup", "cgroup2", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV,
> cg_is_unified_wanted, MNT_FATAL|MNT_IN_CONTAINER },
> { "tmpfs", "/sys/fs/cgroup", "tmpfs", "mode=755", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME,
> cg_is_legacy_wanted, MNT_FATAL|MNT_IN_CONTAINER },
> --
> 2.5.5
>
> From 0fed0c3cdebe72557db528572ed2c531e32e7d5a Mon Sep 17 00:00:00 2001
> From: Tejun Heo <htejun at fb.com>
> Date: Fri, 25 Mar 2016 11:38:50 -0400
> Subject: [PATCH 2/3] core: update populated event handling in unified
> hierarchy
>
> Earlier during the development of unified hierarchy, the populated event was
> reported through by the dedicated "cgroup.populated" file; however, the
> interface was updated so that it's reported through the "populated" field of
> "cgroup.events" file. Update populated event handling logic accordingly.
>
> Signed-off-by: Tejun Heo <htejun at fb.com>
> ---
> src/basic/cgroup-util.c | 45 ++++++++++++++++++++++++++++++++++++---------
> src/basic/cgroup-util.h | 2 ++
> src/core/cgroup.c | 6 +++---
> src/nspawn/nspawn-cgroup.c | 3 +--
> 4 files changed, 42 insertions(+), 14 deletions(-)
>
> diff --git a/src/basic/cgroup-util.c b/src/basic/cgroup-util.c
> index 5124b5b..5043180 100644
> --- a/src/basic/cgroup-util.c
> +++ b/src/basic/cgroup-util.c
> @@ -101,6 +101,39 @@ int cg_read_pid(FILE *f, pid_t *_pid) {
> return 1;
> }
>
> +int cg_read_event(const char *controller, const char *path, const char *event,
> + char **val)
> +{
> + _cleanup_free_ char *events = NULL, *content = NULL;
> + char *p, *line;
> + int r;
> +
> + r = cg_get_path(controller, path, "cgroup.events", &events);
> + if (r < 0)
> + return r;
> +
> + r = read_full_file(events, &content, NULL);
> + if (r < 0)
> + return r;
> +
> + p = content;
> + while ((line = strsep(&p, "\n"))) {
> + char *key;
> +
> + key = strsep(&line, " ");
> + if (!key || !line)
> + return -EINVAL;
> +
> + if (strcmp(key, event))
> + continue;
> +
> + *val = strdup(line);
> + return 0;
> + }
> +
> + return -ENOENT;
> +}
> +
> int cg_enumerate_subgroups(const char *controller, const char *path, DIR **_d) {
> _cleanup_free_ char *fs = NULL;
> int r;
> @@ -1007,18 +1040,12 @@ int cg_is_empty_recursive(const char *controller, const char *path) {
> return unified;
>
> if (unified > 0) {
> - _cleanup_free_ char *populated = NULL, *t = NULL;
> + _cleanup_free_ char *t = NULL;
>
> /* On the unified hierarchy we can check empty state
> - * via the "cgroup.populated" attribute. */
> + * via the "populated" attribute of "cgroup.events". */
>
> - r = cg_get_path(controller, path, "cgroup.populated", &populated);
> - if (r < 0)
> - return r;
> -
> - r = read_one_line_file(populated, &t);
> - if (r == -ENOENT)
> - return 1;
> + r = cg_read_event(controller, path, "populated", &t);
> if (r < 0)
> return r;
>
> diff --git a/src/basic/cgroup-util.h b/src/basic/cgroup-util.h
> index ad1edd9..4254e51 100644
> --- a/src/basic/cgroup-util.h
> +++ b/src/basic/cgroup-util.h
> @@ -96,6 +96,8 @@ static inline bool CGROUP_BLKIO_WEIGHT_IS_OK(uint64_t x) {
>
> int cg_enumerate_processes(const char *controller, const char *path, FILE **_f);
> int cg_read_pid(FILE *f, pid_t *_pid);
> +int cg_read_event(const char *controller, const char *path, const char *event,
> + char **val);
>
> int cg_enumerate_subgroups(const char *controller, const char *path, DIR **_d);
> int cg_read_subgroup(DIR *d, char **fn);
> diff --git a/src/core/cgroup.c b/src/core/cgroup.c
> index 39235a9..9c34928 100644
> --- a/src/core/cgroup.c
> +++ b/src/core/cgroup.c
> @@ -765,7 +765,7 @@ int unit_set_cgroup_path(Unit *u, const char *path) {
> }
>
> int unit_watch_cgroup(Unit *u) {
> - _cleanup_free_ char *populated = NULL;
> + _cleanup_free_ char *events = NULL;
> int r;
>
> assert(u);
> @@ -791,11 +791,11 @@ int unit_watch_cgroup(Unit *u) {
> if (r < 0)
> return log_oom();
>
> - r = cg_get_path(SYSTEMD_CGROUP_CONTROLLER, u->cgroup_path, "cgroup.populated", &populated);
> + r = cg_get_path(SYSTEMD_CGROUP_CONTROLLER, u->cgroup_path, "cgroup.events", &events);
> if (r < 0)
> return log_oom();
>
> - u->cgroup_inotify_wd = inotify_add_watch(u->manager->cgroup_inotify_fd, populated, IN_MODIFY);
> + u->cgroup_inotify_wd = inotify_add_watch(u->manager->cgroup_inotify_fd, events, IN_MODIFY);
> if (u->cgroup_inotify_wd < 0) {
>
> if (errno == ENOENT) /* If the directory is already
> diff --git a/src/nspawn/nspawn-cgroup.c b/src/nspawn/nspawn-cgroup.c
> index 9f9a475..a63c30e 100644
> --- a/src/nspawn/nspawn-cgroup.c
> +++ b/src/nspawn/nspawn-cgroup.c
> @@ -55,8 +55,7 @@ int chown_cgroup(pid_t pid, uid_t uid_shift) {
> "cgroup.events",
> "cgroup.clone_children",
> "cgroup.controllers",
> - "cgroup.subtree_control",
> - "cgroup.populated")
> + "cgroup.subtree_control")
> if (fchownat(fd, fn, uid_shift, uid_shift, 0) < 0)
> log_full_errno(errno == ENOENT ? LOG_DEBUG : LOG_WARNING, errno,
> "Failed to chown() cgroup file %s, ignoring: %m", fn);
> --
> 2.5.5
>
> From 80d7d0ad42ad910acd06517f3f3862eeb1c9dc59 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <htejun at fb.com>
> Date: Fri, 25 Mar 2016 11:38:50 -0400
> Subject: [PATCH 3/3] core: update invoke_sigchld_event() to handle NULL
> ->sigchld_event()
>
> After receiving SIGCHLD, one of the ways manager_dispatch_sigchld() maps the
> now zombie $PID to its unit is through manager_get_unit_by_pid_cgroup() which
> reads /proc/$PID/cgroup and looks up the unit associated with the cgroup path.
>
> On non-unified cgroup hierarchies, a process is immediately migrated to the
> root cgroup on death and the cgroup lookup would always have returned the unit
> associated with it, making it rather pointless but safe. On unified hierarchy,
> a zombie remains associated with the cgroup that it was associated with at the
> time of death and thus manager_get_unit_by_pid_cgroup() will look up the unit
> properly.
>
> However, by the time manager_dispatch_sigchld() is running, the original cgroup
> may have become empty and it and its associated unit might already have been
> removed. If the cgroup path doesn't yield a match, manager_dispatch_sigchld()
> keeps pruning the leaf component. This means that the function may return a
> slice unit for a pid and as a slice doesn't have ->sigchld_event() handler,
> calling invoke_sigchld_event() on it causes a segfault.
>
> This patch updates invoke_sigchld_event() so that it skips calling if the
> handler is not set.
>
> Signed-off-by: Tejun Heo <htejun at fb.com>
> ---
> src/core/manager.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/core/manager.c b/src/core/manager.c
> index f13e933..78f2a21 100644
> --- a/src/core/manager.c
> +++ b/src/core/manager.c
> @@ -1631,7 +1631,9 @@ static void invoke_sigchld_event(Manager *m, Unit *u, const siginfo_t *si) {
> log_unit_debug(u, "Child "PID_FMT" belongs to %s", si->si_pid, u->id);
>
> unit_unwatch_pid(u, si->si_pid);
> - UNIT_VTABLE(u)->sigchld_event(u, si->si_pid, si->si_code, si->si_status);
> +
> + if (UNIT_VTABLE(u)->sigchld_event)
> + UNIT_VTABLE(u)->sigchld_event(u, si->si_pid, si->si_code, si->si_status);
> }
>
> static int manager_dispatch_sigchld(Manager *m) {
> --
> 2.5.5
>
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
More information about the systemd-devel
mailing list