[systemd-devel] [HEADSUP] cgroup changes

Mon Jun 24 12:24:38 PDT 2013

On Mon, Jun 24, 2013 at 12:10 PM, Tejun Heo <tj at kernel.org> wrote:
> Hello, Andy.
>
> On Mon, Jun 24, 2013 at 11:49:05AM -0700, Andy Lutomirski wrote:
>> > I have an idea where it should be headed in the long term but am not
>> > sure about short-term solution.  Given that the only sort wide-spread
>> > use case is virt kthreads, maybe it just needs to be special cased for
>> > now.  Not sure.
>>
>> I'll be okay (I think) if I can reliably set affinities of these
>> threads.  I'm currently doing it with cgroups.
>>
>> That being said, I don't like the direction that kernel thread magic
>> affinity is going.  It may be great for cache performance and reducing
>> random bounding, but I have a scheduling-jitter-sensitive workload and
>> I don't care about overall system throughput.  I need the kernel to
>> stay the f!&k off my important cpus, and arranging for this to happen
>> is becoming increasingly complicated.
>
> Why is it becoming increasingly complicated?  The biggest change
> probably was the shared workqueue pool implementation but that was
> years ago and workqueue has grown pool attributes recently adding more
> properly designed flexibility and, for example, adding default
> affinity for !per-cpu workqueues should be pretty easy now.  But
> anyways, if it's an issue, it should be examined and properly solved
> rather than hacking up hacky solution with cgroup.

Because more things are becoming per cpu without the option of moving
of per-cpu things on behalf of one cpu to another cpu.  RCU is a nice
exception.

>
>> cgroups are most certainly something that a binary can be aware of.
>> It's not like a sysctl knob at all -- it's per process.  I have lots
>
> No, it definitely is not.  Sure it is more granular than sysctl but
> that's it.  It exposes control knobs which are directly tied into
> kernel implementation details.  It is not a properly designed
> programming API by any stretch of imagination.  It is an extreme
> failure on the kernel side that that part hasn't been made crystal
> clear from the beginning.  I don't know how intentional it was but the
> whole thing is completely botched.
>
> cgroup *never* was held to the standard necessary for any widely
> available API and many of the controls it exposes are exactly at the
> level of sysctls.  As the interface was filesystem, it could evade
> scrutiny and with the hierarchical organization also gave the
> impression that it's something which can be used directly by
> individual applications.  It found a loophole in the way we implement
> and police kernel APIs and then exploited it like there's no tomorrow.
>
> We are firmly bound to maintain what already has been exposed from the
> kernel side and I'm not gonna break any of them but the free-for-all
> cgroup is broken and deprecated.  It's gonna wither and fade away and
> any attempt to reverse that will be met with extreme prejudice.

The functionality I care about is that a program can reliably and
hierarchically subdivide system resources -- think rlimits but
actually useful.  I, and probably many other things, want this
functionality.  Yes, the current cgroup interface is awful, but it
gets one thing right: it's a hierarchy.

Back when my software ran on Windows, I used the awful "job" interface
to allocate resources among different parts of my software.  When I
switched to Linux, I lost some of that functionality and replaced
other bits with cgroups.  It's hackish, but it works.

Now we're apparently moving toward having a unified hierarchy
(great!), a more sane API (great!), and a nasty userspace situation
where systemd-using systems control the hierarchy through a highly
limiting systemd-specific interface and non-systemd systems do
something else which will presumably look nothing like what systemd
does.

I would argue that designing a kernel interface that requires exactly
one userspace component to manage it and ties that one userspace
component to something that can't easily be deployed everywhere (the
init system) is as big a cheat as the old approach of sneaking bad
APIs in through a filesystem was.

IOW, please, when designing this, please specify an API that programs
are permitted to use, and let that API be reviewed.

--Andy