[systemd-devel] name=systemd cgroup mounts/hierarchy

Michal Koutný mkoutny at suse.com
Mon Nov 23 19:40:17 UTC 2020


On Thu, Nov 19, 2020 at 10:14:18PM +0300, Andrei Enshin <b1os at bk.ru> wrote:
> For you it might be interesting in sake of improving robustness of
> systemd in case of such invaders as kubelet+cgroupfs : )
I think the interface is clearly defined in the CGROUP_DELEGATION
document though.
I'm happy if a bug can be found in general. I'm happier when it's a well
defined and reproducible case.

> ########## (1) abandoned cgroup ##########
> > systemd isn't aware of it and it would clean the hierarchy according to its configuration
That was related to a controller hierarchy (which I understood was the
k8s issue about).

Below it is a named hierarchy there it's yet different.

> systemd hasn’t deleted the unknown hierarchy, it’s still presented:
> [...]
> cgroup.procs here and in it’s child cgroup 8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15 are empty.
> Seems there are no processes attached to these cgroups. Date of creation is Jul 16-17.
What systemd version is it? What cgroup setup is it (legacy or hybrid)?


> ########## (2) mysterious mount of systemd hierarchy ########## 
> [...]
>   Seems to be cyclic mount. Questions are who, why and when did the second mysterious mount?
> I have two candidates:
> - runc during container creation;
> - systemd, probably because it was confused by kubelet and it’s unexpected usage of cgroups.
I don't see why/how would systemd (PID 1) do this (not sure about
nspawn). Anyway you can try tracing mounts systemwide (e.g. `perf trace
-a -e syscalls:sys_enter_mount`) to find out who does the mount.

> ########## (3) suspected owner of mysterious mount is systemd-nspawn machine ##########
> [...]
> Let’s explore cgroups of centos75 machine:
> # ls -lah /sys/fs/cgroup/systemd/machine.slice/systemd-nspawn\@centos75.service/payload/system.slice/ | grep sys-fs-cgroup-systemd
> 
> drwxr-xr-x.   2 root root 0 Nov  9 20:07 host\x2drootfs-sys-fs-cgroup-systemd-kubepods-burstable-pod7ffde41a\x2dfa85\x2d4b01\x2d8023\x2d69a4e4b50c55-8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15.mount
> 
> drwxr-xr-x.   2 root root 0 Jul 16 08:05 host\x2drootfs-sys-fs-cgroup-systemd.mount
> 
> drwxr-xr-x.   2 root root 0 Jul 16 08:05 host\x2drootfs-var-lib-machines-centos75-sys-fs-cgroup-systemd.mount
>   There are three interesting cgroups in container. First one seems to be in relation with the abandoned cgroup and mysterious mount on the host.
Note those are cgroups created for .mount units (and under nested
payload's system.slice). It tells that within the container a mount
point at
> host/rootfs/sys/fs/cgroup/systemd/kubepods/burstable/pod7ffde41a/fa85/4b01/8023/69a4e4b50c55/8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15
was visible. It doesn't mean that the mount was done within the
container.

I can't tell why was that, it depends how was systemd-nspawn instructed
to realize mounts for the container.

> Creation date is Nov  9 20:07. I’ve updated kubelet at Nov  8 12:01. Сoincidence?! I don't think so.
Yes, it can be related. For instance:
- The cyclic bind mount happened,
- it's visibility was propagated into the nspawn container 
- and inner systemd created cgroup for the (generated) .mount unit
  (possibly after daemon-reload).

> Q1. Let me ask, what is the meaning of mount inside centos75 container?
> /system.slice/host\x2drootfs-sys-fs-cgroup-systemd-kubepods-burstable-pod7ffde41a\x2dfa85\x2d4b01\x2d8023\x2d69a4e4b50c55-8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15.mount
> 
> Q2. Why the mount appeared in the container at Nov 9, 20:07 ?
Hopefully, it's answered above.

> ##### mind-blowing but migh be important note #####
> [...]
> The node already seems to have not healthy mounts:
Is there the conflicting cgroup driver used again?

> # cat /proc/self/mountinfo |grep systemd | grep cgr
> 26 25 0:23 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 866 865 0:23 / /var/lib/rkt/pods/run/3720606d-535b-4e59-a137-ee00246a20c1/stage1/rootfs/opt/stage2/hyperkube-amd64/rootfs/sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 5253 26 0:23 /kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10 /sys/fs/cgroup/systemd/kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10 rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 5251 866 0:23 /kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10 /var/lib/rkt/pods/run/3720606d-535b-4e59-a137-ee00246a20c1/stage1/rootfs/opt/stage2/hyperkube-amd64/rootfs/sys/fs/cgroup/systemd/kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10 rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
>   Also seems systemd-nspawn is not affected yet, since there is no such cgroup inside centos75 container (we have it on each machine) but only abandoned one, with empty cgroup.procs:
It'd depend on the mounts propagation into that container and what
systemd inside that container did (i.e. the mount unit may not have been
created yet).

Michal
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20201123/f254b15e/attachment.sig>


More information about the systemd-devel mailing list