[systemd-devel] Systemd usage wrt libvirt-sandbox

Lennart Poettering lennart at poettering.net
Sun Mar 4 14:34:28 PST 2012


On Thu, 01.03.12 15:29, Daniel P. Berrange (berrange at redhat.com) wrote:

Heya,

> The libvirt-sandbox project[1] is providing an API and command line tools for
> constructing application sandboxes. It uses either LXC or KVM virtualization
> via libvirt, to confine execution of an application binary, giving it a
> read-only view of the host root filesystem, with custom writable areas
> grafted onto selected paths. eg if running httpd inside a sandbox, we give
> it a private /etc/httpd and /var/www, etc.
> 
> The idea is to get the security isolation benefits of virtualization
> technology, without the administrative burden of extra OS installs
> that it normally entails. As such the only processes running inside
> each sandbox are the application being confined, and a minimal custom
> "init" binary provided by libvirt-sandbox itself.
> 
> As we expand our use cases though, particularly to cover the "secure
> containers" feature[2] in Feora 17, it is clear that if we're not
> careful, our miniml "libvirt-sandbox-init-common" binary is going
> turn into a poor mans' copy of systemd. We want to avoid that, and
> instead actually make use of systemd directly.
> 
> Since the sandbox shares the same root filesystem as the host, we
> can't simply exec 'systemd' as is. We'll need to setup a few custom
> writable mounts, where we write out custom units / targets, and
> let systemd keep any state.
> 
> So I'm trying to figure out just what is the absolute minimal setup we
> can configure for systemd. Our primary target for development is to
> sandbox apache. So I'd like to figure out what minimal config / directory
> structure I need to create to run systemd and have it only run apache,
> and a login shell (for debug inside the sandbox).
> 
> I'm guessing that I can perhaps get away with setting up an override
> of the host's /etc/systemd, and writing out custom basic.target
> and default.target unit files, which merely running httpd.unit and
> a shell ?

It is our intention to make systemd run sensibly without any
configuration files at all (i.e. empty /etc). And unless there is a bug
somewhere this should work already.

So one option you have is to take advantage of the fact that systemd
looks for unit files in /run, /etc and /lib, where the former override
the latter. Making use of that you could trivially override the
default.target symlink, and pull in whatever you need, and pull in from
there only what you really need. There will be a couple of caveats
however, since normal service units will implicitly pull in basic.target
(which you can turn of individually with DefaultDependencies=no
however), and that will still pull in some system-provided systemd units.

Regularly we actually test systemd in container environemnts (mostly
nspawn at this point, since LXC is kinda borked on Fedora), and make
sure everything boots up cleanly. And this works fine (though you do see
a couple of error messages one can safely ignore). Without too much work
we should be able to make this entirely clean, by sprinkling a couple of
ConditionVirtualization= settings here and there and everywhere, to not
even try to execute certain things, for example console setup, and
things like that. Some things you probably do want to keep in the
container however, like the tmpfiles stuff as one example.

So there are a number of ways to go here. We have been working towards the
"make an unmodified systemd work in containers" goal. If you want to go more
minimal and not even include the actual unit files you don't need in
your container then things become a bit more complex, since you need to
whitelist what you still want.

In the latter case, the units you really really need are none. However,
you might want:

systemd-tmpfiles-clean.service
systemd-tmpfiles-clean.timer
systemd-tmpfiles-setup.service
console-shell.service
halt.service
reboot.service
poweroff.service
basic.target
emergency.service
emergency.target
final.target
getty.target
halt.target
multi-user.target
poweroff.target
reboot.target
rescue.service
rescue.target
shutdown.target
sockets.target
sysinit.target

But I think it would be a better idea, and more future proof to leave
all unit files in, but not have them have any effect if run in a
container. (for example, by using ConditionVirtualization= as mentioned above)

Kay and I discussed introducing a new switch to systemd, called
--container, which would be available in addition to --system (when run
on a normal machine) and --user (when run for a specific user to manager
user daemons) which would alter the way we look for units. But we
couldn't really nail down the samantics for this.

We are quite open in adding new container-related features to systemd,
in order to minimally alter how systemd works in containers. Our story
is not entirely round there yet, but we are very open for ideas to make
containers work beautifully with systemd.

systemd is quite happy if /sys, /dev, /run and so on are pre-mounted
when it is first executed. In fact, initrds tend to mount these
directories for us already, and so does nspawn actually.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list