[systemd-devel] [PATCH 2/3] nspawn: make nspawn able to cleanly terminate on container errors
Tom Gundersen
teg at jklm.no
Fri Apr 25 11:12:13 PDT 2014
On Fri, Apr 11, 2014 at 2:45 AM, Djalal Harouni <tixxdz at opendz.org> wrote:
> nspawn and the container child use eventfd to wait and notify each other
> that they are ready so the container setup can be completed.
>
> However in its current form the wait/notify event ignore errors that
> may especially affect the child (container).
>
> On errors the child will jump to the "child_fail" label and terminate
> with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd
> is created without the "EFD_NONBLOCK" flag, this leaves the parent
> blocking on the eventfd_read() call.
>
> To fix this without adding extra overheads, we keep the eventfd logic
> and improve it by adding:
>
> * States of the parent and child setups:
> SETUP_INIT, SETUP_SUCCEEDED and SETUP_FAILED
>
> * In the child if the setup succeeded we notify parent by writing a
> SETUP_SUCCEEDED value, otherwise we make sure to write a SETUP_FAILED
> before the _exit(). This prevents the parent from waiting on an event
> that will never come.
>
> * In parent read the child setup state, if SETUP_SUCCEEDED continue,
> otherwise jump to "check_container_status" label, get the container
> child status and release resources.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=76193
>
> Reported-by: Tobias Hunger <tobias.hunger at gmail.com>
Looks good to me.
Cheers,
Tom
> ---
> src/nspawn/nspawn.c | 50 ++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c
> index d606bf2..21322f7 100644
> --- a/src/nspawn/nspawn.c
> +++ b/src/nspawn/nspawn.c
> @@ -104,6 +104,16 @@ typedef enum LinkJournal {
> LINK_GUEST
> } LinkJournal;
>
> +enum {
> + /* States of the parent (nspawn) and child (container) setups
> + * These are used for event notification so we can inform
> + * the other party if the appropriate setup succeeded or
> + * failed. */
> + SETUP_INIT,
> + SETUP_SUCCEEDED,
> + SETUP_FAILED
> +};
> +
> static char *arg_directory = NULL;
> static char *arg_user = NULL;
> static sd_id128_t arg_uuid = {};
> @@ -2815,16 +2825,16 @@ int main(int argc, char *argv[]) {
>
> for (;;) {
> ContainerStatus container_status;
> + eventfd_t event_status;
> int parent_ready_fd = -1, child_ready_fd = -1;
> - eventfd_t x;
>
> - parent_ready_fd = eventfd(0, EFD_CLOEXEC);
> + parent_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC);
> if (parent_ready_fd < 0) {
> log_error("Failed to create event fd: %m");
> goto finish;
> }
>
> - child_ready_fd = eventfd(0, EFD_CLOEXEC);
> + child_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC);
> if (child_ready_fd < 0) {
> log_error("Failed to create event fd: %m");
> goto finish;
> @@ -2980,7 +2990,7 @@ int main(int argc, char *argv[]) {
> /* Tell the parent that we are ready, and that
> * it can cgroupify us to that we lack access
> * to certain devices and resources. */
> - eventfd_write(child_ready_fd, 1);
> + eventfd_write(child_ready_fd, SETUP_SUCCEEDED);
> child_ready_fd = safe_close(child_ready_fd);
>
> if (chdir(arg_directory) < 0) {
> @@ -3081,7 +3091,7 @@ int main(int argc, char *argv[]) {
> env_use = (char**) envp;
>
> /* Wait until the parent is ready with the setup, too... */
> - eventfd_read(parent_ready_fd, &x);
> + eventfd_read(parent_ready_fd, &event_status);
> parent_ready_fd = safe_close(parent_ready_fd);
>
> if (arg_boot) {
> @@ -3113,17 +3123,29 @@ int main(int argc, char *argv[]) {
> log_error("execv() failed: %m");
>
> child_fail:
> + /* Tell the parent that the setup failed, so he
> + * can clean up resources and terminate. */
> + if (child_ready_fd != -1) {
> + eventfd_write(child_ready_fd, SETUP_FAILED);
> + child_ready_fd = safe_close(child_ready_fd);
> + }
> _exit(EXIT_FAILURE);
> }
>
> fdset_free(fds);
> fds = NULL;
>
> - /* Wait until the child reported that it is ready with
> - * all it needs to do with priviliges. After we got
> - * the notification we can make the process join its
> - * cgroup which might limit what it can do */
> - eventfd_read(child_ready_fd, &x);
> + /* Wait for the child event:
> + * If SETUP_FAILED, the child will terminate soon.
> + * If SETUP_SUCCEEDED, the child is reporting that it
> + * is ready with all it needs to do with priviliges.
> + * After we got the notification we can make the
> + * process join its cgroup which might limit what it
> + * can do */
> + r = eventfd_read(child_ready_fd, &event_status);
> + child_ready_fd = safe_close(child_ready_fd);
> + if (r < 0 || event_status == SETUP_FAILED)
> + goto check_container_status;
>
> r = register_machine(pid);
> if (r < 0)
> @@ -3146,9 +3168,12 @@ int main(int argc, char *argv[]) {
> goto finish;
>
> /* Notify the child that the parent is ready with all
> - * its setup, and thtat the child can now hand over
> + * its setup, and that the child can now hand over
> * control to the code to run inside the container. */
> - eventfd_write(parent_ready_fd, 1);
> + r = eventfd_write(parent_ready_fd, SETUP_SUCCEEDED);
> + parent_ready_fd = safe_close(parent_ready_fd);
> + if (r < 0)
> + goto finish;
>
> k = process_pty(master, &mask, arg_boot ? pid : 0, SIGRTMIN+3);
> if (k < 0) {
> @@ -3162,6 +3187,7 @@ int main(int argc, char *argv[]) {
> /* Kill if it is not dead yet anyway */
> terminate_machine(pid);
>
> +check_container_status:
> /* Redundant, but better safe than sorry */
> kill(pid, SIGKILL);
>
> --
> 1.8.5.3
>
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
More information about the systemd-devel
mailing list