[systemd-devel] [PATCH v3 2/2] nspawn: make nspawn robust to container failure

Mon Jun 2 04:32:53 PDT 2014

On Sun, May 25, 2014 at 05:28:13AM +0200, Lennart Poettering wrote:
> On Sat, 24.05.14 14:58, Djalal Harouni (tixxdz at opendz.org) wrote:
> 
> Applied both. Thanks!
Ok, thanks!

> However, I am not too convinced about the clone() thing in
> shared/eventfd-util.[ch]. That sounds too specific to be shared betwen
> more than one tool. I have the suspicion that we really should move that
> code back into nspawn.
Ok, no problem. I was experimenting some code to enter the container,
and I needed to deal with races, so I just did that.

> Anyway, merged this for now, as we can fix this later, and nspawn is
> moving to quickly that this is likely to happen to get fixed soonishly. 
Yes, this saves me another round of git rebase! which I'll probably put
later to fix another pending bug:

When dealing with those races, I noticed that we can hit the error path
in src/nspawn/nspawn.c:register_machine():1317  on reboot the container.

Output:

...
Unmounting /sys/kernel/config.
Unmounting /sys/kernel/debug.
Unmounting /dev/mqueue.
All filesystems unmounted.
Storage is finalized.
Rebooting.

Container fedora-tree is being rebooted.
Failed to register machine: Unit machine-fedora\x2dtree.scope already exists.

So register_machine() of the next reboot catches up the current pending
terminate_machine() => org.freedesktop.machine1.Machine.Terminate task

I didn't have time to think about it, we can try that same "abandoned"
logic, or just check the status of the machine and delay the register?

Hmm I'll try to see!

-- 
Djalal Harouni
http://opendz.org