[igt-dev] [PATCH i-g-t 2/2] runner: Also generate igt at runner@aborted when aborting internally

Wed Jul 29 12:13:33 UTC 2020

On Wed, Jul 29, 2020 at 01:05:10PM +0100, Chris Wilson wrote:
> Quoting Petri Latvala (2020-07-29 12:34:25)
> > If we can't kill the (main) test process, or when the test process
> > exits with IGT_EXIT_ABORT, we abort the execution. Pass that
> > information along to the other machinery that tracks whether we
> > aborted, thus also getting that information to the end user in the
> > form of the pseudo-result igt at runner@aborted.
> > 
> > Signed-off-by: Petri Latvala <petri.latvala at intel.com>
> > Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> > Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz at intel.com>
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  runner/executor.c | 47 ++++++++++++++++++++++++++---------------------
> >  1 file changed, 26 insertions(+), 21 deletions(-)
> > 
> > diff --git a/runner/executor.c b/runner/executor.c
> > index 77818987..0377195d 100644
> > --- a/runner/executor.c
> > +++ b/runner/executor.c
> > @@ -795,10 +795,11 @@ static int next_kill_signal(int killed)
> >   *  >0 - Timeout happened, need to recreate from journal
> >   */
> >  static int monitor_output(pid_t child,
> > -                          int outfd, int errfd, int kmsgfd, int sigfd,
> > -                          int *outputs,
> > -                          double *time_spent,
> > -                          struct settings *settings)
> > +                         int outfd, int errfd, int kmsgfd, int sigfd,
> > +                         int *outputs,
> > +                         double *time_spent,
> > +                         struct settings *settings,
> > +                         char **abortreason)
> >  {
> >         fd_set set;
> >         char buf[2048];
> > @@ -1116,6 +1117,7 @@ static int monitor_output(pid_t child,
> >                                 if (status == IGT_EXIT_ABORT) {
> >                                         errf("Test exited with IGT_EXIT_ABORT, aborting.\n");
> >                                         aborting = true;
> > +                                       *abortreason = strdup("Test exited with IGT_EXIT_ABORT");
> >                                 }
> >  
> >                                 if (time_spent)
> > @@ -1142,6 +1144,7 @@ static int monitor_output(pid_t child,
> >                                         if (kill(child, 0) && errno == ESRCH)
> >                                                 errf("The test process no longer exists, "
> >                                                      "but we didn't get informed of its demise...\n");
> > +                                       asprintf(abortreason, "Child refuses to die, tainted 0x%lx.", taints);
> >                                 }
> >  
> >                                 dump_dmesg(kmsgfd, outputs[_F_DMESG]);
> > @@ -1309,7 +1312,8 @@ static int execute_next_entry(struct execute_state *state,
> >                               struct settings *settings,
> >                               struct job_list_entry *entry,
> >                               int testdirfd, int resdirfd,
> > -                             int sigfd, sigset_t *sigmask)
> > +                             int sigfd, sigset_t *sigmask,
> > +                             char **abortreason)
> >  {
> >         int dirfd;
> >         int outputs[_F_LAST];
> > @@ -1406,7 +1410,7 @@ static int execute_next_entry(struct execute_state *state,
> >         outpipe[1] = errpipe[1] = -1;
> >  
> >         result = monitor_output(child, outfd, errfd, kmsgfd, sigfd,
> > -                               outputs, time_spent, settings);
> > +                               outputs, time_spent, settings, abortreason);
> >  
> >  out_kmsgfd:
> >         close(kmsgfd);
> > @@ -1783,7 +1787,7 @@ bool execute(struct execute_state *state,
> >  
> >         for (; state->next < job_list->size;
> >              state->next++) {
> > -               char *reason;
> > +               char *reason = NULL;
> >                 int result;
> >  
> >                 if (should_die_because_signal(sigfd)) {
> > @@ -1797,7 +1801,21 @@ bool execute(struct execute_state *state,
> >                                             settings,
> >                                             &job_list->entries[state->next],
> >                                             testdirfd, resdirfd,
> > -                                           sigfd, &sigmask);
> > +                                           sigfd, &sigmask,
> > +                                           &reason);
> > +
> > +               if (reason != NULL || (reason = need_to_abort(settings)) != NULL) {
> > +                       char *prev = entry_display_name(&job_list->entries[state->next]);
> > +                       char *next = (state->next + 1 < job_list->size ?
> > +                                     entry_display_name(&job_list->entries[state->next + 1]) :
> > +                                     strdup("nothing"));
> > +                       write_abort_file(resdirfd, reason, prev, next);
> > +                       free(prev);
> > +                       free(next);
> > +                       free(reason);
> > +                       status = false;
> > +                       break;
> > +               }
> 
> Ok, so this now checks for the termination before checking for the
> overall timeout. That seems fine.
> 
> The question is whether that should be flagged as an igt at runner@aborted
> as well since that is also an abnormal condition preventing the
> execution of later tests.

overall-timeout is a "soft" stop in the sense that the expected use
case is run-until-overall-timeout, then invoke igt_resume. Rinse and
repeat until done. Abnormal condition only until you resume.

A result is coming for this, but it won't be igt at runner@aborted, it
will be under another name. The main difference being that resuming an
execution that overall-timeouted must remove that data, because it
will be used for "hey you, you generated result pages for an execution
that wasn't fully done yet," while @aborted is used for "hey at least
one test in this set did the nasties" and that data must stay no
matter what.

-- 
Petri Latvala