[systemd-devel] [HEADSUP] cgroup changes

Kok, Auke-jan H auke-jan.h.kok at intel.com
Fri Jun 21 14:47:34 PDT 2013


On Fri, Jun 21, 2013 at 2:17 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Fri, 21.06.13 14:10, Kok, Auke-jan H (auke-jan.h.kok at intel.com) wrote:
>
>> > So, in the future, when you have some service, and that service wants to
>> > alter some cgroup resource limits for itself (let's say: set its own cpu
>> > shares value to 1500), this is what should happen: the service should
>> > use a call like sd_pid_get_unit() to get its own unit name, and then use
>> > dbus to invoke SetCPUShares(1500) for that service. systemd will then do
>> > the rest. (*)
>> >
>> > Lennart
>> >
>> > (*) to make this even simpler we have been thinking of defining a new
>> > "virtual" bus object path /org/freedesktop/systemd1/self/ or so which
>> > will always points to the callers own unit. This would be similar to
>> > /proc/self/ which also points to its own PID dir for each
>> > process... With that in place you could then set any resource setting
>> > you want with a single bus method call.
>>
>> This is fine for applications that manage themselves, but I'm seeing
>> more interest in use cases where we want external influence on cgroup
>> hierarchies, for instance:
>>
>> - foreground/background priorities - a window manager marks background
>> applications and puts them in the freezer, changes oom_score_adj so
>> that old apps can get automatically cleaned up in case memory
>> availability is low.
>> - detecting runaway apps and taking cpu slices away from them.
>> - thermally constraining classes of applications
>>
>> Those would be tasks that an external process would do by manipulating
>> properties of cgroups, not something each task would do on it's own.
>>
>> Do you suggest these manipulations should be implemented without high
>> level systemd API's and the "controller" just manipulates the cgroups
>> directly?
>
> All changes to cgroup attributes must go through systemd. If the WM
> wants to freeze or adjust OOM he needs to issue systemd bus calls for
> that.
>
> The run-away stuff I can't follow? the kernel will distribute CPU
> evenly among running apps if all want it, so not seeing why there's more
> monitoring needed.
>
> The thermal stuff is probably best done in-kernel i guess... Too
> dangerous/subject-to-latency for userspace, no?

Only userspace can distinguish between e.g. a foreground and
background application (WM) and decide that CPU consumption of certain
apps in the background is excessive, and throttle it down further,
which is somewhat similar to using freezer to just SIGSTOP them
entirely basically.

Thermal throttling from userspace allows you to distinguish between
"never make my SETI turn the fan on" and "throttle the entire system
when I reach high fan speeds". You can't do that in the kernel. [1]
Arguably this could be done in-task and not by an external controller,
but you're still trusting the task to do the right thing, which may
not be something you want to do.


Auke


[1] Note that the new Intel P-state driver by Dirk Brandewie changes
how things work with nice(). The old behaviour was abused by folks
running bitcoin miners at nice values which caused ondemand to do
something irrational: nice-only tasks would keep the CPU in lowest
frequencies, which is terrible from a power perspective - now every
daemon running at nice value takes much longer to complete its task,
burning more power then when it had raced-to-idle.


More information about the systemd-devel mailing list