[systemd-devel] [systemd][cgroup in container] problem with cgroup hierarchy in container

Daniel P. Berrange berrange at redhat.com
Fri Mar 7 01:39:45 PST 2014


On Thu, Mar 06, 2014 at 07:54:05PM +0100, Lennart Poettering wrote:
> On Thu, 06.03.14 16:55, Dariusz Michaluk (d.michaluk at samsung.com) wrote:
> 
> > 
> > On 05.03.2014 19:16, Lennart Poettering wrote:
> > >nspawn and libvirt-lxc mostly follow the same code paths and register
> > >via machined... So it's weird that different things happen. Somehow the
> > >systemd instance inside the container must be confused about the cgroup
> > >it is running in...
> > 
> > Next few cents. I noticed that when I run lxc-libvirt container I
> > get warning "Failed to install release agent, ignoring: No such file
> > or directory", which does not occur when I use nspawn.
> 
> Oh!
> 
> Hmm, thta suggests that libvirt-lxc might not mount the naked cgroupfs
> tree to /sys/fs/cgroup/systemd, but only a subdirectory. This of course
> might cause the weird setup that the host tree is "duplicated" for the
> container!
> 
> Unfortunately it is not possible to only mount a subtree of the cgroup
> hierarchy into the container, since then the data from /proc/self/cgroup
> won't match /sys/fs/cgroup/systemd anymore... Also, the root of the
> cgroup trees has slightly different semantics and more properties than
> the children.
> 
> Is this the default setup of libvirt-lxc for those dirs? I figure we
> should talk to Daniel to get that changed...

Yeah that was setup that way a while ago, but I forgot this would
invalidate /proc/self/cgroup information. It was a bit of a poor
mans attempt at securing cgroups, but really it is just a waste
of time unless user namespaces are available. Can someone file a
bug against libvirt for this and we'll look at not doing this.

> Each container really needs to see the full tree. The best thing
> possible to make sure that the containers can't muck with anything
> outside of the tree is to mount the upper parts read-only with a bind
> mount, but other than that i don't see that we could do anything
> there...

User namespaces are the best bet here. Once th root UID is remapped
the container won't be able to move themselves out of their subtree.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|


More information about the systemd-devel mailing list