[systemd-devel] [HEADSUP] cgroup changes

Brian Bockelman bbockelm at cse.unl.edu
Tue Jun 25 06:31:27 PDT 2013


On Jun 25, 2013, at 4:56 AM, Lennart Poettering <lennart at poettering.net> wrote:

> On Tue, 25.06.13 02:21, Brian Bockelman (bbockelm at cse.unl.edu) wrote:
> 
>> A few questions came to mind which may provide interesting input 
>> to your design process:
>> 1) I use cgroups heavily for resource accounting.  Do you envision 
>>  me querying via dbus for each accounting attribute?  Or do you 
>>  envision me querying for the cgroup name, then accessing the 
>> controller statistics directly?
> 
> Good question. Tejun wants systemd to cover that too. I am not entirely
> sure. I don't like the extra roundtrip for measuring the accounting
> bits. But maybe we can add a library that avoids the roundtrip, and
> simply provides you with high-level accounting values for cgroups. That
> way, for *changing* things you'd need to go via the bus, for *reading*
> things we'd give you a library that goes directly to the cgroupfs and
> avoids the roundtrip.

I like this idea.  Hopefully single-writer, multiple-reader is more sustainable path forward.

What about the notification APIs?  We currently use the memory.oom_control to get a notification when a job hits limits (this allows us to know the job died due to memory issues, as the user code itself typically just SIGSEGV's).  Is subscribing to notifications considered reading or writing in this case?

> 
>> 2) I currently fork and setup the resource environment (namespaces, 
>>  environment, working directory, etc).  Can an appropriately privileged 
>>  process create a sub-slice, place itself in it, and then drop privs 
>> / exec?
> 
> We'll probably have a way how you can take an existing set of processes
> and turn them dynamically into a new unit in systemd. These units would
> be mostly like service units, except that systemd wouldn't start the
> processes, but they would be "foreign" created. We are not sure about
> the name for this yet (i.e. whether to cover it under the ".service"
> suffix, but we'll probably call it "Scopes" instead, with the suffix
> ".scope").
> 
> The scope units could then be manipulated at runtime for (cgroup based)
> resource management the way normal services are too.
> 
> So basically, a service unit could be assigned to a slice unit, and
> could then create "scope" units which detach subprocesses from the
> original service unit, and get their own cgroup in the same slice or any
> other.
> 

This sounds manageable.

> 
>> 5) Will I be able to delegate management of a subslice to a
>> non-privileged user?
> 
> Unlikely, at least for the beginning. 
> 

(Very) long-term, this is attractive for us.  We prefer the batch system to run as unprivileged when possible (and to sacrifice the minimal amount of functionality to do so!).

>> I'm excited to see new ideas (again, having system tools be aware of 
>> the batch system activity is intriguing [2]), but am a bit worried about
>> losing functionality and the cost of porting things to the new era!
> 
> There's certainly going to be some lost flexibility. But of course we'll
> try to cover all interesting usecases.

I'll try to lurk and provide guidance about how us nutty batch system folks may try to use it.

> 
>> [2] Hopefully something that works better than 
>> "ps xawf -eo pid,user,cgroup,args" which currently segfaults for me :(
> 
> Hmm, could you file a bug, please?
> 

Couldn't figure out a patch -- too little time.  However, I at least tracked down the offending code.  Bug report is here:

https://bugzilla.redhat.com/show_bug.cgi?id=977854

Thanks,

Brian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3414 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20130625/7b010db4/attachment.bin>


More information about the systemd-devel mailing list