[systemd-devel] [HEADSUP] cgroup changes

Mon Jun 24 19:21:14 PDT 2013

Lennart Poettering <lennart <at> poettering.net> writes:

> 
> 2) This hierarchy becomes private property of systemd. systemd will set
> it up. Systemd will maintain it. Systemd will rearrange it. Other
> software that wants to make use of cgroups can do so only through
> systemd's APIs. This single-writer logic is absolutely necessary, since
> interdependencies between the various controllers, the various
> attributes, the various cgroups are non-obvious and we simply cannot
> allow that cgroup users alter the tree independently of each other
> forever. Due to all this: The "Pax Cgroup" document is a thing of the
> past, it is dead.
> 

Hi [1],

I currently contribute cgroup support to a batch system 
(http://research.cs.wisc.edu/htcondor/) and 
am trying to figure out how this will affect me.

Right now, I take the resources provided by the cgroup setup by the 
sysadmin and sub-divide them  amongst the running jobs.  
Cgroups are used for resource management, resource accounting, and job 
management (using the freezer controller to deliver signals to all 
processes at once).  Jobs last  between seconds to hours; it is 
acceptable for a setup time of, say, several hundred milliseconds - as 
long as we can easily create and destroy many jobs.

A few questions came to mind which may provide interesting input 
to your design process:
1) I use cgroups heavily for resource accounting.  Do you envision 
  me querying via dbus for each accounting attribute?  Or do you 
  envision me querying for the cgroup name, then accessing the 
controller statistics directly?
2) I currently fork and setup the resource environment (namespaces, 
  environment, working directory, etc).  Can an appropriately privileged 
  process create a sub-slice, place itself in it, and then drop privs 
/ exec?
3) More generally, will I be able to interact with slices directly, or 
  will I need to create throw-away units and launch them via systemd 
  (versus a "normal" fork/exec)?
    - The latter causes quite a bit of anxiety for me - we currently 
      support many POSIX platforms plus Windows (hey - at least 
      we dropped HPUX) and I'd like to avoid a completely independent 
      code path for spawning jobs on Linux.
4) Will many short-lived jobs cause any heartache?  Would anything 
  untoward happen to my system if I spawned / destroyed jobs (and 
  corresponding units or slices) at, say, 1Hz?
5) Will I be able to delegate management of a subslice to a non-privileged user?

I'm excited to see new ideas (again, having system tools be aware of 
the batch system activity is intriguing [2]), but am a bit worried about
losing functionality and the cost of porting things to the new era!

Thanks!

Brian

[1] apologies if the reply comes through mangled; posting through
  the gmane web interface.
[2] Hopefully something that works better than 
 "ps xawf -eo pid,user,cgroup,args" which currently segfaults for me :(