[systemd-devel] "dynamic" uid allocation (was: [PATCH] loopback setup in unprivileged containers)
Daniel P. Berrange
berrange at redhat.com
Wed Feb 4 01:01:35 PST 2015
On Tue, Feb 03, 2015 at 06:05:00PM +0100, Lennart Poettering wrote:
> On Tue, 03.02.15 16:34, Serge Hallyn (serge.hallyn at ubuntu.com) wrote:
>
> > > > the UID/GID on entire filesystem sub-trees given to containers with
> > > > userns is a real unpleasant thing to have to deal with. I'd not want
> >
> > Of course you would *not* want to take a stock rootfs where uid == 0
> > and shift that into the container, as that would give root in the
> > container a chance to write root-owned files on the host to leverage
> > later in a convoluted attack :)
>
> Is this really a problem? I mean, the only way how this could be
> exploitable is if people make the container hierarchy accessible to
> other users, but that should be easy to prohibit by making the
> container's parent dir 0700, which we already do for nspawn's
> container in /var/lib/machines... The only other risk I can see here
> is that if people use traditional ext4 quota, then the container's
> disk usage will be added to the host's usage. But that's easy to
> avoid, by simply never placing container images and the host on the
> same quota device...
>
> Also, in the case of systemd-nspawn we strongly emphasize usage with
> loopback devices. In that case there's no vulnerability at all, since
> the device is completely seperate from the host fs, and it will only
> be mounted in the container, but not in the host...
NB, that the container filesystem is visible via /proc/$PID/root,
but I agree with you in general. I don't see a reason to avoid
the scenario Serge mentioned. Indeed I think it is important that
we explicitly support it, because ultimately I think we need to
be able to take any arbitrary disk image and safely boot it in
either a container or virtual machine. ie we should not have to
build custom images just for containers - any such need should be
considered a failure of the technology / impl IMHO.
> > We might want to come up with a containers concensus that container
> > rootfs's are always shipped with uid range 0-65535 -> 100000-165535.
> > That still leaves a chance for container A (mapped to 200000-265535)
> > to write valid setuid-root binary for container B (mapped to
> > 300000-365535), which isn't possible otherwise. But that's better
> > than doing so for host-root.
>
> Well, ultimately I'd recommend an automatism like this for container
> managers:
>
> a) if not otherwise configured, let's give each container their own
> 16bit of uids. This would mean each 32bit uid could be neatly
> split into the upper 16bit that would become a "container" id,
> plus the lower 16bit for the actual "virtual" UID.
>
> b) we will never set up UID ranges orthogonal from GID ranges.
>
> c) when a container image is started, the container manager first
> checks the UID/GID owner of the root of the root file system. It
> masks the lower 16bit away, and only looks for the upper 16bit.
>
> d) It will then look for an unused container id (which means, an
> unused range of 64K UIDs), and then shifts the offset it
> identified following c) to this new container id.
>
> With that in place it doesn't really matter which base people use in
> their containers, the container manager would do the right thing, and
> shift everything into the right place. Paranoid people could ship
> their container images shifted to some ID of their choice, and lazy
> folks could just ship their container images with base 0, but then
> must make sure they don't give anybody else access to the hierarchy,
> and don't confuse quota...
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the systemd-devel
mailing list