[systemd-devel] systemd-nspawn

Wed Apr 17 08:57:27 PDT 2013

On Mon, 15.04.13 20:48, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:

> On Mon, Apr 15, 2013 at 08:36:33PM +0200, Zbigniew Jędrzejewski-Szmek wrote:
> > On Mon, Apr 15, 2013 at 02:31:56PM -0300, Chir0n wrote:
> > > # yum -y --releasever=19 --nogpg --installroot=/srv/mycontainer
> > > --disablerepo='*' --enablerepo=fedora install systemd passwd yum
> > > fedora-release vim-minimal
> > > # systemd-nspawn -bD
> > > /srv/mycontainer
> > 
> >   sudo nsenter -t $PID -m -u -i -n -p /bin/bash
> Hm, if I say 'halt' in this bash window, I see
> 
> bash-4.2# halt
> bash-4.2# [1]  + 14306 suspended (signal)  sudo nsenter -t 13221 -m -u -i -n -p /bin/bash
> 
> and the container's init hangs after 'All filesystems unmounted.'.
> 
> Only when I do 'fg', halt resume and systemd-nspawn quits.
> 
> Apparrently only happens rarely (1/5 so far).
> 
> What's going on?

When we go on a killing spree on shutdown we first SIGSTOP everything so
that processes are terminated out of the blue sky, and don't generate
confusing messages that their children died, because we also killed
them. We kinda fake an "atomic" killing of all system process. (This is
something sysvinit, did too btw...).

Now, the SIGSTOP will be reported to your host's bash, since your host's
bash probably stays the parent of the container bash. That's why you get
the weird job control thing... (i wonder though what is actually
reported as PPID for your container shell inside the container?). Now,
somehow this weird parent/child relationship breaks SIGCONT, and hence
your nsenter process won't work?

I still wonder if we shouldn't try to be more careful when entering a
namespace. i.e. inheriting half of the process settings from the host,
and half from the container sounds semi-ugly... Maybe we enter the
namespace with an external process, but then use that to get PID 1 to
somehow start something for us, and then pass STDIN/STDOUT to that... So
instead of entering plus exec()ing internal code, we'd just enter, and
run external code... If you follow what I mean?

Hmm, or maybe like this: enter the container, allocate a pty,
instantiate getty at .service for that pty, pass pty fd back to host, exit
in namespace, and process pty on the host as long as we don't get EOF.

The pty handling we have in nspawn anyway, so this could be pretty
straightforward to hack up. 

Of course, this would only work if PID1 in the container is systemd, but
that should be OK. People can always use nsenter by hand if they have
different init systems in the container...

(BTW, we now can name containers in the system, so maybe we want to use
that in nspawn anyway, to introduce a new switch that gets you a getty
on an existing container, identified by its name...)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.