[systemd-devel] Rationale for mirroring cpu and systemd cgroup subsystems

Umut Tezduyar Lindskog umut at tezduyar.com
Wed Nov 5 07:00:16 PST 2014


On Wed, Nov 5, 2014 at 2:05 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Wed, 05.11.14 13:41, Umut Tezduyar Lindskog (umut at tezduyar.com) wrote:
>
>> Hi,
>>
>> What is the reasoning for not joining cpu subsystem with systemd subsystem?
>>
>> There are couple ways you can mirror [1] cpu and systemd subsystems
>> and doing so can result completely different cpu bandwidth for
>> processes.
>>
>> I am wondering why we don't mirror them by default.
>
> Because simply enabling a "cpu" controller for a unit already has
> effects on the processes running it. For example, you don't get RT
> anymore, and the general scheduling is altered to schedule your entire
> group evenly against the all groups on the same level.

Doesn't it make sense to turn it on by default and let users wanting
RT disable it? Seems like this was the case at some point -
http://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/
(Very much outdated article, we don't have ControlGroup= anymore)

>
> systemd will "mirror" a cgroup in the "cpu" hierarchy as soon as you
> set a property on it that requires the "cpu" or "cpuacct" hierarchy,
> for example CPUAccounting=, CPUShares= or CPUQuota.

You can turn on mirroring during runtime but as far as I know there is
no way going back without rebooting right?

>
> Bu the general rule is: don't enable a controller for a unit, unless
> we really need to. We must make sure the tree is always as minimal as
> possible.
>
>> Not mirroring them results PID 1, each kernel thread and each user
>> space task having the same cpu bandwidth (/sys/fs/cgroup/cpu/tasks).
>> Even worse is the cpu bandwidth PID 1 gets goes down with the number
>> of processes spawned, possibly opening ways to DOS.
>
> There has been a plan to introduce CPUFairScheduling= that you can set
> on a slice, and that will turn on the cpu controller for all children
> of that slice. Setting that on system.slice should have the desired
> effect.
>
> Regarding PID1: with the unified cgroup hierarchy it will not be
> possible to have both populated subcgroups and processes in the same
> cgroup. This means we will have to move PID 1 out of the root cgroup
> anyway, probably into some unit in "system.slice". This should fix
> your problem, I figure? This would also allow applying cgroup resource
> limits to PID 1 itself, for example to control the way it is scheduled
> against other proceses.

We discussed putting systemd in to its own cgroup in Germany during
hack fest. It would solve the problem I have mentioned.

Umut

>
> Lennart
>
> --
> Lennart Poettering, Red Hat


More information about the systemd-devel mailing list