[systemd-devel] [PATCH 2/3] nspawn: make nspawn able to cleanly terminate on container errors
Djalal Harouni
tixxdz at opendz.org
Thu Apr 10 17:45:51 PDT 2014
nspawn and the container child use eventfd to wait and notify each other
that they are ready so the container setup can be completed.
However in its current form the wait/notify event ignore errors that
may especially affect the child (container).
On errors the child will jump to the "child_fail" label and terminate
with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd
is created without the "EFD_NONBLOCK" flag, this leaves the parent
blocking on the eventfd_read() call.
To fix this without adding extra overheads, we keep the eventfd logic
and improve it by adding:
* States of the parent and child setups:
SETUP_INIT, SETUP_SUCCEEDED and SETUP_FAILED
* In the child if the setup succeeded we notify parent by writing a
SETUP_SUCCEEDED value, otherwise we make sure to write a SETUP_FAILED
before the _exit(). This prevents the parent from waiting on an event
that will never come.
* In parent read the child setup state, if SETUP_SUCCEEDED continue,
otherwise jump to "check_container_status" label, get the container
child status and release resources.
https://bugs.freedesktop.org/show_bug.cgi?id=76193
Reported-by: Tobias Hunger <tobias.hunger at gmail.com>
---
src/nspawn/nspawn.c | 50 ++++++++++++++++++++++++++++++++++++++------------
1 file changed, 38 insertions(+), 12 deletions(-)
diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c
index d606bf2..21322f7 100644
--- a/src/nspawn/nspawn.c
+++ b/src/nspawn/nspawn.c
@@ -104,6 +104,16 @@ typedef enum LinkJournal {
LINK_GUEST
} LinkJournal;
+enum {
+ /* States of the parent (nspawn) and child (container) setups
+ * These are used for event notification so we can inform
+ * the other party if the appropriate setup succeeded or
+ * failed. */
+ SETUP_INIT,
+ SETUP_SUCCEEDED,
+ SETUP_FAILED
+};
+
static char *arg_directory = NULL;
static char *arg_user = NULL;
static sd_id128_t arg_uuid = {};
@@ -2815,16 +2825,16 @@ int main(int argc, char *argv[]) {
for (;;) {
ContainerStatus container_status;
+ eventfd_t event_status;
int parent_ready_fd = -1, child_ready_fd = -1;
- eventfd_t x;
- parent_ready_fd = eventfd(0, EFD_CLOEXEC);
+ parent_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC);
if (parent_ready_fd < 0) {
log_error("Failed to create event fd: %m");
goto finish;
}
- child_ready_fd = eventfd(0, EFD_CLOEXEC);
+ child_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC);
if (child_ready_fd < 0) {
log_error("Failed to create event fd: %m");
goto finish;
@@ -2980,7 +2990,7 @@ int main(int argc, char *argv[]) {
/* Tell the parent that we are ready, and that
* it can cgroupify us to that we lack access
* to certain devices and resources. */
- eventfd_write(child_ready_fd, 1);
+ eventfd_write(child_ready_fd, SETUP_SUCCEEDED);
child_ready_fd = safe_close(child_ready_fd);
if (chdir(arg_directory) < 0) {
@@ -3081,7 +3091,7 @@ int main(int argc, char *argv[]) {
env_use = (char**) envp;
/* Wait until the parent is ready with the setup, too... */
- eventfd_read(parent_ready_fd, &x);
+ eventfd_read(parent_ready_fd, &event_status);
parent_ready_fd = safe_close(parent_ready_fd);
if (arg_boot) {
@@ -3113,17 +3123,29 @@ int main(int argc, char *argv[]) {
log_error("execv() failed: %m");
child_fail:
+ /* Tell the parent that the setup failed, so he
+ * can clean up resources and terminate. */
+ if (child_ready_fd != -1) {
+ eventfd_write(child_ready_fd, SETUP_FAILED);
+ child_ready_fd = safe_close(child_ready_fd);
+ }
_exit(EXIT_FAILURE);
}
fdset_free(fds);
fds = NULL;
- /* Wait until the child reported that it is ready with
- * all it needs to do with priviliges. After we got
- * the notification we can make the process join its
- * cgroup which might limit what it can do */
- eventfd_read(child_ready_fd, &x);
+ /* Wait for the child event:
+ * If SETUP_FAILED, the child will terminate soon.
+ * If SETUP_SUCCEEDED, the child is reporting that it
+ * is ready with all it needs to do with priviliges.
+ * After we got the notification we can make the
+ * process join its cgroup which might limit what it
+ * can do */
+ r = eventfd_read(child_ready_fd, &event_status);
+ child_ready_fd = safe_close(child_ready_fd);
+ if (r < 0 || event_status == SETUP_FAILED)
+ goto check_container_status;
r = register_machine(pid);
if (r < 0)
@@ -3146,9 +3168,12 @@ int main(int argc, char *argv[]) {
goto finish;
/* Notify the child that the parent is ready with all
- * its setup, and thtat the child can now hand over
+ * its setup, and that the child can now hand over
* control to the code to run inside the container. */
- eventfd_write(parent_ready_fd, 1);
+ r = eventfd_write(parent_ready_fd, SETUP_SUCCEEDED);
+ parent_ready_fd = safe_close(parent_ready_fd);
+ if (r < 0)
+ goto finish;
k = process_pty(master, &mask, arg_boot ? pid : 0, SIGRTMIN+3);
if (k < 0) {
@@ -3162,6 +3187,7 @@ int main(int argc, char *argv[]) {
/* Kill if it is not dead yet anyway */
terminate_machine(pid);
+check_container_status:
/* Redundant, but better safe than sorry */
kill(pid, SIGKILL);
--
1.8.5.3
More information about the systemd-devel
mailing list