[systemd-devel] [HEADSUP] cgroup changes

Andy Lutomirski luto at amacapital.net
Mon Jun 24 11:49:05 PDT 2013


On Mon, Jun 24, 2013 at 11:38 AM, Tejun Heo <tj at kernel.org> wrote:
> Hello,
>
> On Mon, Jun 24, 2013 at 03:27:15PM +0200, Lennart Poettering wrote:
>> On Sat, 22.06.13 15:19, Andy Lutomirski (luto at amacapital.net) wrote:
>>
>> > 1. I put all the entire world into a separate, highly constrained
>> > cgroup.  My real-time code runs outside that cgroup.  This seems to
>> > exactly what slices are for, but I need kernel threads to go in to
>> > the constrained cgroup.  Will systemd support this?
>>
>> I am not sure whether the ability to move kernel threads into cgroups
>> will stay around at all, from the kernel side. Tejun, can you comment on this?
>
> Any kernel threads with PF_NO_SETAFFINITY set already can't be removed
> from the root cgroup.  In general, I don't think moving kernel threads
> into !root cgroups is a good idea.  They're in most cases shared
> resources and userland doesn't really have much idea what they're
> actually doing, which is the fundmental issue.
>
> Which kthreads are running on the kernel side and what they're doing
> is strict implementation detail from the kernel side.  There's no
> effort from kernel side in keeping them stable and userland is likely
> to get things completely wrong - e.g. many kernel threads named after
> workqueues in any recent kernels don't actually do anything until the
> system is under heavy memory pressure.  Userland can't tell and has no
> control over what's being executed where at all and that's the way it
> should be.
>
> That said, there are cases where certain async executions are
> concretely bound to userland processes - say, (planned) aio updates,
> virt drivers and so on.  Right now, virt implements something pretty
> hacky but I think they'll have to be tied closer to the usual process
> mechanism - ie. they should be saying that these kthreads are serving
> this process and should be treated as such in terms of resource
> control rather than the current "move this kthread to this set of
> cgroups, don't ask why" thing.  Another not-well-thought-out aspect of
> the current cgroup.  :(
>
> I have an idea where it should be headed in the long term but am not
> sure about short-term solution.  Given that the only sort wide-spread
> use case is virt kthreads, maybe it just needs to be special cased for
> now.  Not sure.

I'll be okay (I think) if I can reliably set affinities of these
threads.  I'm currently doing it with cgroups.

That being said, I don't like the direction that kernel thread magic
affinity is going.  It may be great for cache performance and reducing
random bounding, but I have a scheduling-jitter-sensitive workload and
I don't care about overall system throughput.  I need the kernel to
stay the f!&k off my important cpus, and arranging for this to happen
is becoming increasingly complicated.

>
>> > 2. I manage services and tasks outside systemd (for one thing, I
>> > currently use Ubuntu, but even if I were on Fedora, I have a bunch
>> > of fine-grained things that figure out how they're supposed to
>> > allocate resources, and porting them to systemd just to keep working
>> > in the new world order would be a PITA [1]).
>> >
>> > (cgroups have the odd feature that they are per-task, not per thread
>> > group, and the systemd proposal seems likely to break anything that
>> > actually wants task granularity.  I may actually want to use this,
>> > even though it's a bit evil -- my real-time thread groups have
>> > non-real-time threads.)
>>
>> Here too, Tejun is pretty keen on removing the ability of splitting up
>> threads into cgroups from the kernel, and will only allow this
>> per-process. Tejun, please comment!
>
> Yes, again, the biggest issue is how much of low-level cgroup details
> become known to individual programs.  Splitting threads into different
> cgroup would in most cases mean that the binary itself would become
> aware of cgroup and it's akin to burying sysctl knob tunings into
> individual binaries.  cgroup is not an interface for each individual
> program to fiddle with.  If certain thread-granular control is
> absolutely necessary and justifiable, it's something to be added to
> the existing thread API, not something to be bolted on using cgroups.
>

cgroups are most certainly something that a binary can be aware of.
It's not like a sysctl knob at all -- it's per process.  I have lots
of binaries that have worked quite well for a couple years that move
themselves into different cgroups.  I have no problem with a unified
hierarchy, but I need control of my little piece of the hierarchy.

I don't care if the interface to do so changes, but the basic
functionality is important.

> So, I'm quite strongly against allowing allowing splitting threads of
> the same process into different cgroups.

I don't need that feature.  (Which is not to say that no one else does.)

--Andy


More information about the systemd-devel mailing list