[systemd-devel] Unable to run systemd in an LXC / cgroup container.

Fri Oct 26 08:58:33 PDT 2012

On Thu, 2012-10-25 at 23:38 +0200, Lennart Poettering wrote:
> On Thu, 25.10.12 11:59, Michael H. Warfield (mhw at WittsEnd.com) wrote:

> > > http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
> > 
> > > Unfortunately, in our case, merely getting a mount in there is a
> > > complication in that it also has to be populated but, at least, we
> > > understand the problem set now.
> > 
> > Ok...  Serge and I were corresponding on the lxc-users list and he had a
> > suggestion that worked but I consider to be a bit of a sub-optimal
> > workaround.  Ironically, it was to mount devtmpfs on /dev.  We don't
> > (currently) have a method to auto-populate a tmpfs mount with the needed
> > devices and this provided it.  It does have a problem that makes me
> > uncomfortable in that the container now has visibility into the
> > hosts /dev system.  I'm a security expert and I'm not comfortable with
> > that "solution" even with the controls we have.  We can control access
> > but still, not happy with that.

> That's a pretty bad idea, access control to the device nodes in devtmpfs
> is controlled by the host's udev instance. That means if your group/user
> lists in the container and the host differ you already lost. Also access
> control in udev is dynamic, due to stuff like uaccess and similar. You
> really don't want to to have that into the container, i.e. where device
> change ownership all the time with UIDs/GIDs that make no sense at all
> in the container.

Concur.

> In general I think it's a good idea not to expose any "real" devices to
> the container, but only the "virtual" ones that are programming
> APIs. That means: no /dev/sda, or /dev/ttyS0, but /dev/null, /dev/zero,
> /dev/random, /dev/urandom. And creating the latter in a tmpfs is quite
> simple.

> > If I run lxc-console (which attaches to one of the vtys) it gives me
> > nothing.  Under sysvinit and upstart I get vty login prompts because
> > they have started getty on those vtys.  This is important in case
> > network access has not started for one reason or another and the
> > container was started detached in the background.

> The getty behaviour of systemd in containers is documented here:

> http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

Sorry.  This is unacceptable.  We need some way that these will be
active and you will be consistent with other containers.

> If LXC mounts ptys on top of the VT devices that's a really bad idea
> too, since /dev/tty1 and friends expose a number of APIs beyond the mere
> tty device that you cannot emulate with that. It includes files in /sys,
> as well as /dev/vcs and /dev/vcsa, various ioctls, and so on. Heck, even
> the most superficial of things, the $TERM variable will be
> incorrect. LXC shouldn't do that.

REGARDLESS.  I'm in this situation now testing what I thought was a hang
condition (which is proving to be something else).  I started a
container detached redirecting the console to a file (a parameter I was
missing) and the log to another file (which I had been doing).  But, for
some reason, sshd is not starting up.  I have no way to attach to the
bloody console of the container and I have no getty's on a vty I can
attach to using lxc-console and I can't remote access a container which,
for all other intents and purposes, appears to be running fine.
Parameterize this bloody thing so we can have control over it.

> LXC really shouldn't pretent a pty was a VT tty, it's not. From the
> libvirt guys it has been proposed that we introduce a new env var to
> pass to PID 1 of the container, that simply lists ptys to start gettys
> on. That way we don't pretend anything about ttys that the ttys can't
> hold and have a clean setup.
> 
> > I SUSPECT the hang condition is something to do with systemd trying to
> > start and interactive console on /dev/console, which sysvinit and
> > upstart do not do. 
> 
> Yes, this is documented, please see the link I already posted, and which
> I linked above a second time.
> 
> > I've got some more problems relating to shutting down containers, some
> > of which may be related to mounting tmpfs on /run to which /var/run is
> > symlinked to.  We're doing halt / restart detection by monitoring utmp
> > in that directory but it looks like utmp isn't even in that directory
> > anymore and mounting tmpfs on it was always problematical.  We may have
> > to have a more generic method to detect when a container has shut down
> > or is restarting in that case.
> 
> I can't parse this. The system call reboot() is virtualized for
> containers just fine and the container managaer (i.e. LXC) can check for
> that easily.
> 
> Lennart
> 
> -- 
> Lennart Poettering - Red Hat, Inc.
> 

-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20121026/0138a56c/attachment.pgp>