[systemd-devel] Unable to run systemd in an LXC / cgroup container.

Thu Oct 25 14:38:09 PDT 2012

On Thu, 25.10.12 11:59, Michael H. Warfield (mhw at WittsEnd.com) wrote:

> > http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
> 
> > Unfortunately, in our case, merely getting a mount in there is a
> > complication in that it also has to be populated but, at least, we
> > understand the problem set now.
> 
> Ok...  Serge and I were corresponding on the lxc-users list and he had a
> suggestion that worked but I consider to be a bit of a sub-optimal
> workaround.  Ironically, it was to mount devtmpfs on /dev.  We don't
> (currently) have a method to auto-populate a tmpfs mount with the needed
> devices and this provided it.  It does have a problem that makes me
> uncomfortable in that the container now has visibility into the
> hosts /dev system.  I'm a security expert and I'm not comfortable with
> that "solution" even with the controls we have.  We can control access
> but still, not happy with that.

That's a pretty bad idea, access control to the device nodes in devtmpfs
is controlled by the host's udev instance. That means if your group/user
lists in the container and the host differ you already lost. Also access
control in udev is dynamic, due to stuff like uaccess and similar. You
really don't want to to have that into the container, i.e. where device
change ownership all the time with UIDs/GIDs that make no sense at all
in the container.

In general I think it's a good idea not to expose any "real" devices to
the container, but only the "virtual" ones that are programming
APIs. That means: no /dev/sda, or /dev/ttyS0, but /dev/null, /dev/zero,
/dev/random, /dev/urandom. And creating the latter in a tmpfs is quite
simple.

> If I run lxc-console (which attaches to one of the vtys) it gives me
> nothing.  Under sysvinit and upstart I get vty login prompts because
> they have started getty on those vtys.  This is important in case
> network access has not started for one reason or another and the
> container was started detached in the background.

The getty behaviour of systemd in containers is documented here:

http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

If LXC mounts ptys on top of the VT devices that's a really bad idea
too, since /dev/tty1 and friends expose a number of APIs beyond the mere
tty device that you cannot emulate with that. It includes files in /sys,
as well as /dev/vcs and /dev/vcsa, various ioctls, and so on. Heck, even
the most superficial of things, the $TERM variable will be
incorrect. LXC shouldn't do that.

LXC really shouldn't pretent a pty was a VT tty, it's not. From the
libvirt guys it has been proposed that we introduce a new env var to
pass to PID 1 of the container, that simply lists ptys to start gettys
on. That way we don't pretend anything about ttys that the ttys can't
hold and have a clean setup.

> I SUSPECT the hang condition is something to do with systemd trying to
> start and interactive console on /dev/console, which sysvinit and
> upstart do not do. 

Yes, this is documented, please see the link I already posted, and which
I linked above a second time.

> I've got some more problems relating to shutting down containers, some
> of which may be related to mounting tmpfs on /run to which /var/run is
> symlinked to.  We're doing halt / restart detection by monitoring utmp
> in that directory but it looks like utmp isn't even in that directory
> anymore and mounting tmpfs on it was always problematical.  We may have
> to have a more generic method to detect when a container has shut down
> or is restarting in that case.

I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.