[systemd-devel] [HEADSUP] cgroup changes

Mon Jun 24 12:10:00 PDT 2013

Hello, Andy.

On Mon, Jun 24, 2013 at 11:49:05AM -0700, Andy Lutomirski wrote:
> > I have an idea where it should be headed in the long term but am not
> > sure about short-term solution.  Given that the only sort wide-spread
> > use case is virt kthreads, maybe it just needs to be special cased for
> > now.  Not sure.
> 
> I'll be okay (I think) if I can reliably set affinities of these
> threads.  I'm currently doing it with cgroups.
> 
> That being said, I don't like the direction that kernel thread magic
> affinity is going.  It may be great for cache performance and reducing
> random bounding, but I have a scheduling-jitter-sensitive workload and
> I don't care about overall system throughput.  I need the kernel to
> stay the f!&k off my important cpus, and arranging for this to happen
> is becoming increasingly complicated.

Why is it becoming increasingly complicated?  The biggest change
probably was the shared workqueue pool implementation but that was
years ago and workqueue has grown pool attributes recently adding more
properly designed flexibility and, for example, adding default
affinity for !per-cpu workqueues should be pretty easy now.  But
anyways, if it's an issue, it should be examined and properly solved
rather than hacking up hacky solution with cgroup.

> cgroups are most certainly something that a binary can be aware of.
> It's not like a sysctl knob at all -- it's per process.  I have lots

No, it definitely is not.  Sure it is more granular than sysctl but
that's it.  It exposes control knobs which are directly tied into
kernel implementation details.  It is not a properly designed
programming API by any stretch of imagination.  It is an extreme
failure on the kernel side that that part hasn't been made crystal
clear from the beginning.  I don't know how intentional it was but the
whole thing is completely botched.

cgroup *never* was held to the standard necessary for any widely
available API and many of the controls it exposes are exactly at the
level of sysctls.  As the interface was filesystem, it could evade
scrutiny and with the hierarchical organization also gave the
impression that it's something which can be used directly by
individual applications.  It found a loophole in the way we implement
and police kernel APIs and then exploited it like there's no tomorrow.

We are firmly bound to maintain what already has been exposed from the
kernel side and I'm not gonna break any of them but the free-for-all
cgroup is broken and deprecated.  It's gonna wither and fade away and
any attempt to reverse that will be met with extreme prejudice.

> of binaries that have worked quite well for a couple years that move
> themselves into different cgroups.  I have no problem with a unified
> hierarchy, but I need control of my little piece of the hierarchy.
> 
> I don't care if the interface to do so changes, but the basic
> functionality is important.

Whether you care or not is completely irrelevant.  Individual binaries
widely incorporating cgroup details automatically binds the kernel.
It becomes excruciatingly painful to back out after certain point.  I
don't think we're there yet given the overall immaturity and brokeness
of cgroups and it's imperative that we back the hell out as fast as
possible before this insanity spreads any wider.

Thanks.

-- 
tejun