[systemd-devel] [PATCH 2/3] nspawn: make nspawn able to cleanly terminate on container errors

Tom Gundersen teg at jklm.no
Fri Apr 25 11:12:13 PDT 2014


On Fri, Apr 11, 2014 at 2:45 AM, Djalal Harouni <tixxdz at opendz.org> wrote:
> nspawn and the container child use eventfd to wait and notify each other
> that they are ready so the container setup can be completed.
>
> However in its current form the wait/notify event ignore errors that
> may especially affect the child (container).
>
> On errors the child will jump to the "child_fail" label and terminate
> with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd
> is created without the "EFD_NONBLOCK" flag, this leaves the parent
> blocking on the eventfd_read() call.
>
> To fix this without adding extra overheads, we keep the eventfd logic
> and improve it by adding:
>
> * States of the parent and child setups:
>   SETUP_INIT, SETUP_SUCCEEDED and SETUP_FAILED
>
> * In the child if the setup succeeded we notify parent by writing a
>   SETUP_SUCCEEDED value, otherwise we make sure to write a SETUP_FAILED
>   before the _exit(). This prevents the parent from waiting on an event
>   that will never come.
>
> * In parent read the child setup state, if SETUP_SUCCEEDED continue,
>   otherwise jump to "check_container_status" label, get the container
>   child status and release resources.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=76193
>
> Reported-by: Tobias Hunger <tobias.hunger at gmail.com>

Looks good to me.

Cheers,

Tom


> ---
>  src/nspawn/nspawn.c | 50 ++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c
> index d606bf2..21322f7 100644
> --- a/src/nspawn/nspawn.c
> +++ b/src/nspawn/nspawn.c
> @@ -104,6 +104,16 @@ typedef enum LinkJournal {
>          LINK_GUEST
>  } LinkJournal;
>
> +enum {
> +        /* States of the parent (nspawn) and child (container) setups
> +         * These are used for event notification so we can inform
> +         * the other party if the appropriate setup succeeded or
> +         * failed. */
> +        SETUP_INIT,
> +        SETUP_SUCCEEDED,
> +        SETUP_FAILED
> +};
> +
>  static char *arg_directory = NULL;
>  static char *arg_user = NULL;
>  static sd_id128_t arg_uuid = {};
> @@ -2815,16 +2825,16 @@ int main(int argc, char *argv[]) {
>
>          for (;;) {
>                  ContainerStatus container_status;
> +                eventfd_t event_status;
>                  int parent_ready_fd = -1, child_ready_fd = -1;
> -                eventfd_t x;
>
> -                parent_ready_fd = eventfd(0, EFD_CLOEXEC);
> +                parent_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC);
>                  if (parent_ready_fd < 0) {
>                          log_error("Failed to create event fd: %m");
>                          goto finish;
>                  }
>
> -                child_ready_fd = eventfd(0, EFD_CLOEXEC);
> +                child_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC);
>                  if (child_ready_fd < 0) {
>                          log_error("Failed to create event fd: %m");
>                          goto finish;
> @@ -2980,7 +2990,7 @@ int main(int argc, char *argv[]) {
>                          /* Tell the parent that we are ready, and that
>                           * it can cgroupify us to that we lack access
>                           * to certain devices and resources. */
> -                        eventfd_write(child_ready_fd, 1);
> +                        eventfd_write(child_ready_fd, SETUP_SUCCEEDED);
>                          child_ready_fd = safe_close(child_ready_fd);
>
>                          if (chdir(arg_directory) < 0) {
> @@ -3081,7 +3091,7 @@ int main(int argc, char *argv[]) {
>                                  env_use = (char**) envp;
>
>                          /* Wait until the parent is ready with the setup, too... */
> -                        eventfd_read(parent_ready_fd, &x);
> +                        eventfd_read(parent_ready_fd, &event_status);
>                          parent_ready_fd = safe_close(parent_ready_fd);
>
>                          if (arg_boot) {
> @@ -3113,17 +3123,29 @@ int main(int argc, char *argv[]) {
>                          log_error("execv() failed: %m");
>
>                  child_fail:
> +                        /* Tell the parent that the setup failed,  so he
> +                         * can clean up resources and terminate. */
> +                        if (child_ready_fd != -1) {
> +                                eventfd_write(child_ready_fd, SETUP_FAILED);
> +                                child_ready_fd = safe_close(child_ready_fd);
> +                        }
>                          _exit(EXIT_FAILURE);
>                  }
>
>                  fdset_free(fds);
>                  fds = NULL;
>
> -                /* Wait until the child reported that it is ready with
> -                 * all it needs to do with priviliges. After we got
> -                 * the notification we can make the process join its
> -                 * cgroup which might limit what it can do */
> -                eventfd_read(child_ready_fd, &x);
> +                /* Wait for the child event:
> +                 * If SETUP_FAILED, the child will terminate soon.
> +                 * If SETUP_SUCCEEDED, the child is reporting that it
> +                 * is ready with all it needs to do with priviliges.
> +                 * After we got the notification we can make the
> +                 * process join its cgroup which might limit what it
> +                 * can do */
> +                r = eventfd_read(child_ready_fd, &event_status);
> +                child_ready_fd = safe_close(child_ready_fd);
> +                if (r < 0 || event_status == SETUP_FAILED)
> +                        goto check_container_status;
>
>                  r = register_machine(pid);
>                  if (r < 0)
> @@ -3146,9 +3168,12 @@ int main(int argc, char *argv[]) {
>                          goto finish;
>
>                  /* Notify the child that the parent is ready with all
> -                 * its setup, and thtat the child can now hand over
> +                 * its setup, and that the child can now hand over
>                   * control to the code to run inside the container. */
> -                eventfd_write(parent_ready_fd, 1);
> +                r = eventfd_write(parent_ready_fd, SETUP_SUCCEEDED);
> +                parent_ready_fd = safe_close(parent_ready_fd);
> +                if (r < 0)
> +                        goto finish;
>
>                  k = process_pty(master, &mask, arg_boot ? pid : 0, SIGRTMIN+3);
>                  if (k < 0) {
> @@ -3162,6 +3187,7 @@ int main(int argc, char *argv[]) {
>                  /* Kill if it is not dead yet anyway */
>                  terminate_machine(pid);
>
> +check_container_status:
>                  /* Redundant, but better safe than sorry */
>                  kill(pid, SIGKILL);
>
> --
> 1.8.5.3
>
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel


More information about the systemd-devel mailing list