[systemd-devel] Unable to run systemd in an LXC / cgroup container.

Lennart Poettering mzerqung at 0pointer.de
Mon Oct 22 07:11:59 PDT 2012


On Sun, 21.10.12 17:25, Michael H. Warfield (mhw at WittsEnd.com) wrote:

> Hello,
> 
> This is being directed to the systemd-devel community but I'm cc'ing the
> lxc-users community and the Fedora community on this for their input as
> well.  I know it's not always good to cross post between multiple lists
> but this is of interest to all three communities who may have valuable
> input.
> 
> I'm new to this particular list, just having joined after tracking a
> problem down to some systemd internals...
> 
> Several people over the last year or two on the lxc-users list have been
> discussions trying to run certain distros (notably Fedora 16 and above,
> recent Arch Linux and possibly others) in LXC containers, virualizing
> entire servers this way.  This is very similar to Virtuoso / OpenVZ only
> it's using the native Linux cgroups for the containers (primary reason I
> dumped OpenVZ was to avoid their custom patched kernels).  These recent
> distros have switched to systemd for the main init process and this has
> proven to be disastrous for those of us using LXC and trying to install
> or update our containers.

Note that it is explicitly our intention to make running systemd inside
of containers as smooth as possibly. The notes Kay linked summarize what
the container manager needs to do for best integration.

> To summarize the problem...  The LXC startup binary sets up various
> things for /dev and /dev/pts for the container to run properly and this
> works perfectly fine for SystemV start-up scripts and/or Upstart.
> Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
> on /dev/pts which then break things horribly.  This is because the
> kernel currently lacks namespaces for devices and won't for some time to
> come (in design).  When devtmpfs gets mounted over top of /dev in the
> container, it then hijacks the hosts console tty and several other
> devices which had been set up through bind mounts by LXC and should have
> been LEFT ALONE.

Please initialize a minimal tmpfs on /dev. systemd will then work fine.

> Yes!  I recognize that this problem with devtmpfs and lack of namespaces
> is a potential security problem anyways that could (and does) cause
> serious container-to-host problems.  We're just not going to get that
> fixed right away in the linux cgroups and namespaces.

No, devtmpfs really doesn't need updating, containers simply shouldn't
use it.

> How do we work around this problem in systemd where it has hard coded
> mounts in the binary that we can't override or configure?  Or is it
> there and I'm just missing it trying to examine the sources?  That's how
> I found where the problem lay.

systemd will make use of pre-existing mounts if they exist, and only
mount something new if they don't exist.

Note that there are reports that LXC has issues with the fact that newer
systemd enables shared mount propagation for all mounts by default (this
should actually be beneficial for containers as this ensures that new
mounts appear in the containers). LXC when run on such a system fails as
soon as it tries to use pivot_root(), as that is incompatible with
shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
or MS_BIND to place the new root dir in / instead. A short term
work-around is to simply remount the root tree to private before
invoking LXC.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list