[systemd-devel] name=systemd cgroup mounts/hierarchy

Andrei Enshin b1os at bk.ru
Wed Nov 18 18:46:03 UTC 2020


Thank you for checking!

Yes, it clearly seems that systemd and kubelet in such setup shares cgroups which is not supposed.
We will prioritize moving our cluster to use systemd cgroup driver to avoid such conflict.
Also I think it would be good to have extra check on kubelet side to avoid running cgroupfs driver on systemd systems. But it’s question to k8s folks which already rised in slack.
 
-----
Just out of curiosity, how systemd in particular may be disrupted with such record in root of it’s cgroups hierarchy as /kubpods/bla/bla during service (de)activation?
Or how it may disrupt the kubelet or workload running by it?

Will it delete such records because of some logic? Or there will be name conflict during cgroup creation?
Would be happy to know more details of cgroups interference.

I've read few articles:
https://systemd.io/CGROUP_DELEGATION/
http://0pointer.de/blog/projects/cgroups-vs-cgroups.html
https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/

even outdated one
https://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups/

Seems I missed some technical details how exact it will interfere.
-----

> It may be a residual inside kubelet context when environment was prepared for a container spawned from within this context

Just last finding of this weird cgroup mount:
# find / -name '*8842def24*'
/sys/fs/cgroup/systemd/kubepods/burstable/pod7ffde41a-fa85-4b01-8023-69a4e4b50c55/8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15
/sys/fs/cgroup/systemd/machine.slice/systemd-nspawn at centos75.service/payload/system.slice/host\x2drootfs-sys-fs-cgroup-systemd-kubepods-burstable-pod7ffde41a\x2dfa85\x2d4b01\x2d8023\x2d69a4e4b50c55-8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15.mount

and

# machinectl list
MACHINE  CLASS     SERVICE        OS     VERSION ADDRESSES
centos75 container systemd-nspawn centos 7       -        
frr      container systemd-nspawn ubuntu 18.04   -        

2 machines listed. Since container with id  8842def241  is not running it’s hard to understand what exactly happened, who did such mount and reproduce the conflict situation.

May I ask, how systemd-nspawn may be involved in it? Or any ideas what happened so I still have two times mounted the systemd named hierarchy?
  
>Thursday, November 19, 2020 3:25 AM +09:00 from Michal Koutný <mkoutny at suse.com>:
> 
>Thanks for the details.
>
>On Mon, Nov 16, 2020 at 09:30:20PM +0300, Andrei Enshin < b1os at bk.ru > wrote:
>> I see the kubelet crash with error: «Failed to start ContainerManager failed to initialize top level QOS containers: root container [kubepods] doesn't exist»
>> details:  https://github.com/kubernetes/kubernetes/issues/95488
>I skimmed the issue and noticed that your setup uses 'cgroupfs' cgroup
>driver. As explained in the other messages in this thread, it conflicts
>with systemd operation over the root cgroup tree.
>
>> I can see same two mounts of named systemd hierarchy from shell on the same node, simply by `$ cat /proc/self/mountinfo`
>> I think kubelet is running in the «main» mount namespace which has weird named systemd mount.
>I assume so as well. It may be a residual inside kubelet context when
>environment was prepared for a container spawned from within this
>context.
>
>> I would like to reproduce such weird mount to understand the full
>> situation and make sure I can avoid it in future.
>I'm afraid you may be seeing results of various races between systemd
>service (de)activation and container spawnings under the "shared" root
>(both of which comprise cgroup creation/removal and migrations).
>There's a reason behind the cgroup subtree delegation.
>
>So I'd say there's not much to do from systemd side now.
>
>
>Michal
>  
 
 
---
Best Regards,
Andrei Enshin
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20201118/4555e852/attachment.htm>


More information about the systemd-devel mailing list