[systemd-devel] "dynamic" uid allocation (was: [PATCH] loopback setup in unprivileged containers)

Tue Feb 3 07:03:09 PST 2015

On Tue, Feb 03, 2015 at 03:41:22PM +0100, Lennart Poettering wrote:
> On Tue, 30.12.14 06:49, Simon Peeters (peeters.simon at gmail.com) wrote:
> 
> > 2014-12-29 14:14 GMT+00:00 Tom Gundersen <teg at jklm.no>:
> > > On Mon, Dec 29, 2014 at 2:34 PM, Lennart Poettering
> > > <lennart at poettering.net> wrote:
> > <snip>
> > >> I am open to adding support for this, but I think the allocation of
> > >> the UID ranges should really happen automatically, and not be
> > >> something the admin has to manually assign.
> > >>
> > >> Which means we'd enter dynamic UID allocation terroritory, and that
> > >> opens a huge can of worms...
> > >
> > > Would we not also need to support explicit assignment, in case someone
> > > has a preexisting image they want to match in a specific way? In that
> > > case we could start off without the dynamic allocation and add that
> > > later. It certainly would make testing a lot simpler if we had userns
> > > support sooner rather than later (at least in the case of netlink it
> > > appears to be quite a mess).
> > 
> > Inspired by this topic I wrote a quick'n'dirty uid allocator[1]
> > this allocator manages the upper 2G uid's, which using Matthias Urlichs example
> > of 2048 uid's per container, still allows for 1M containers.
> > 
> > It curently can't persist these allocations, but that is on my
> > "0.0.1" todolist.
> 
> Hmm, so, I thought a lot about this in the past weeks. I think the way
> I'd really like to see this work in the end is that we never have to
> persist the UID mappings. This could work if the kernel would provide
> us with the ability to bind mount a file system into the container
> applying a fixed UID shift. That way, the shifted UIDs would never hit
> the actual disk, and hence we wouldn't have to persist their mappings.
> 
> Instead on each container startup we'd look for a new UID range, and
> release it entirely when the container shuts down. The bind mount with
> UID shift would then shift the UIDs up, the userns stuff would shift
> it down from inside the container again.
> 
> Of course, this all depends on whether the kernel will get an
> extension to apply uid shifts to bind mounts. I hear they want to
> provide this, but let's see.

I would dearly love to see that happen. Having to recursively change
the UID/GID on entire filesystem sub-trees given to containers with
userns is a real unpleasant thing to have to deal with. I'd not want
the filesystem UID shift to only apply to bind mounts though. It is
not uncommon to use a disk image[1] for a container's filesystem, so
being able to request a UID shift on *any* filesystem mount is pretty
desirable, rather than having to mount the image and then bind mount
it onto itself just to apply the UID shift.

Regards,
Daniel

[1] Using a separate disk image per container means a container can't
    DOS other containers by exhausting inodes for example with $millions
    of small files.
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|