[systemd-devel] remounting root fs outside containers as MS_SHARED

Lennart Poettering lennart at poettering.net
Wed May 14 09:14:55 PDT 2014


On Tue, 13.05.14 20:16, Ani Sinha (ani at arista.com) wrote:

> The following change started mounting the rootfs as shared :
> 
> b3ac5f8cb98757416d8660023d6564a7c411f0a0
> 
> The commit log and the corresponding comment in the code says that if
> any setups needed the kernel default private mount, one could use
> something like :
> 
> mount --make-rprivate /
> 
> right after the boot.

THis was true back then, but isn't really the case anymore. We have
started making use of PrivateTmp= and PrivateDevices= in quite a few
services in the distributions now. Disabling mount propagation for the
root dir is kinda incompatible with that, since it will disable mount
and umount propagation from it into the service's namespaces. This has
the effect that all mount points you established in the host will always
stay mounted in the service's mount namespace, and have no way to get
rid of them anymore, possibly keeping blcok devices busy. 

At this point in time this means that the root dir should be shared, and
if you use anything else you'd have to go through all services and
disable PrivateTmp=, PrivateDevices=, and the ReadOnlyDirectories=
settings everywhere.

> Unfortunately, we have a setup where we do need the kernel default
> private mount and we tried what has been suggested by using a systemd
> service file to remount rootfs to private. Unfortunately, while this

Why precisely would you want to disable propagation from the root dir?
This usually is an indication of very broken software that is unable to
unshare its mounts properly from the root dir, after openiing its own
mount namespace.

Normally, if something calls unshare(CLONE_NS) (or the equivalent via
clone()), then it needs to follow this by mount(NULL, "/", NULL,
MS_SLAVE|MS_REC, NULL). Not doing that is simply broken...

> As it is not possible to go and fix all these libraries, I have a
> simple request from the systemd hackers here. Can we please have a
> configuration option (either as a kernel command line, or a systemd
> startup command line or a config file option) that disables this
> default behaviour for setups that do need the private rootfs mount?
> That way the default remains as is for most systems and yet there will
> be a way to override this when one really wants to. It would seem to
> give us the best of both worlds.

Sorry, but I think having the root namespace set to anything but
MS_SHARED is really broken, and we should not support that... I also
fail to see the usecase for this, except for working around for
seriously broken software...

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list