[systemd-devel] Docker vs PrivateTmp

Alexander Larsson alexl at redhat.com
Fri Jan 30 02:02:10 PST 2015


On fre, 2015-01-23 at 11:31 -0500, Daniel J Walsh wrote:
> On 01/22/2015 10:02 PM, Lennart Poettering wrote:
> > On Sat, 17.01.15 23:02, Lars Kellogg-Stedman (lars at redhat.com) wrote:
> >
> >> See the `devicemapper` mountpoint created by Docker for the container:
> >>
> >>     # grep devicemapper/mnt /proc/mounts
> >>     /dev/mapper/docker-253:6-98310-e68df3f45d6151259ce84a0e467a3117840084e99ef3bbc654b33f08d2d6dd62
> >>     /var/lib/docker/devicemapper/mnt/e68df3f45d6151259ce84a0e467a3117840084e99ef3bbc654b33f08d2d6dd62
> >>     ext4
> >>     rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c261,c1018",relatime,discard,stripe=16,data=ordered
> >>     0 0
> > I am not sure why docker makes these mounts visible in the host
> > namespace at all. This smells like a bug.

They need to at least be visible to the docker daemon, because it needs
to look into it to do diffs between images when e.g. commiting. It
doesn't necessarily have to be in the host namespace though, it could be
in a different namespace owned only by the docker daemon. I wanted to do
that, but for reasons that escape me at the moment that was problematic
and I never got to it.

> >> Watch Docker fail to destroy the container because it is unable to remove the mountpoint directory:
> >>
> >>     Jan 17 22:43:03 pk115wp-lkellogg docker-1.4.1-dev[18239]:
> >>     time="2015-01-17T22:43:03-05:00" level="error" msg="Handler for DELETE
> >>     /containers/{name:.*} returned error: Cannot destroy container e68df3f45d61:
> >>     Driver devicemapper failed to remove root filesystem
> >>     e68df3f45d6151259ce84a0e467a3117840084e99ef3bbc654b33f08d2d6dd62: Device is
> >>     Busy"
> > This smells as if Docker incorrectly sets the mount propagation bits
> > on its own mounts.
> >
> > It would be good checking /proc/self/mountinfo inside and outside of
> > docker's own namespace, and checking how the propagation bits are set
> > for the individual mounts. It's a bit hard to read, but the
> > interesting bits are in the 7th column of that file.
> >
> > In general: docker should do the equivalent of "mount --make-rslave /"
> > as first thing after opening its mount namespace, so that from that
> > point on mounts and especiall *un*mounts propagate from the host into
> > the container, but not vice versa.
> >
> > If they do not invoke that, then the propagation will stay at
> > "shared", which means the mounts will appear in the host and vice
> > versa, which is certainly undesired.
> >
> > Also, they should not use "mount --make-rprivate /", as that means
> > anything the host mounted will stay mounted in the container forever,
> > which is a problem.
> >
> > Also, they really need to make this recursive, so that all mount
> > points they have access too are detached from the host!

It was a while since I looked at this, but i believe that the docker
containers run as MS_PRIVATE, and they explicitly unmount all the host
filesystems exept the ones specifically mounted in as volumes.

I think the problem is that docker daemon makes 
/var/lib/docker/devicemapper private in the host namespace to handle
some scalability issues we found in the kernel. This causes problem not
with docker containers (because they unmount all other mounts as per the
above), but with other namespace-using apps. For instance, if a service
with PrivateTmp is launched, it will inherit the existing mounts
in /var/lib/docker/devicemapper at the point of startup, but when these
are eventually unmounted in the host namespace this is not propagated
into the service (due to it being a private mount, not a slave mount).

We could try making this slave instead, but I don't know if that then
fixes the scalability issues we had, because they were related to
stupidities in the kernel wrt propagating mounts. If it doesn't work,
then we have to put docker-daemon in its own namespace.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
       alexl at redhat.com            alexander.larsson at gmail.com 
He's an impetuous amnesiac hairdresser who dotes on his loving old ma. 
She's an elegant Bolivian single mother from out of town. They fight 
crime! 



More information about the systemd-devel mailing list