[igt-dev] [PATCH i-g-t 2/2] runner: Also generate igt at runner@aborted when aborting internally
Petri Latvala
petri.latvala at intel.com
Wed Jul 29 12:13:33 UTC 2020
On Wed, Jul 29, 2020 at 01:05:10PM +0100, Chris Wilson wrote:
> Quoting Petri Latvala (2020-07-29 12:34:25)
> > If we can't kill the (main) test process, or when the test process
> > exits with IGT_EXIT_ABORT, we abort the execution. Pass that
> > information along to the other machinery that tracks whether we
> > aborted, thus also getting that information to the end user in the
> > form of the pseudo-result igt at runner@aborted.
> >
> > Signed-off-by: Petri Latvala <petri.latvala at intel.com>
> > Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> > Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz at intel.com>
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > runner/executor.c | 47 ++++++++++++++++++++++++++---------------------
> > 1 file changed, 26 insertions(+), 21 deletions(-)
> >
> > diff --git a/runner/executor.c b/runner/executor.c
> > index 77818987..0377195d 100644
> > --- a/runner/executor.c
> > +++ b/runner/executor.c
> > @@ -795,10 +795,11 @@ static int next_kill_signal(int killed)
> > * >0 - Timeout happened, need to recreate from journal
> > */
> > static int monitor_output(pid_t child,
> > - int outfd, int errfd, int kmsgfd, int sigfd,
> > - int *outputs,
> > - double *time_spent,
> > - struct settings *settings)
> > + int outfd, int errfd, int kmsgfd, int sigfd,
> > + int *outputs,
> > + double *time_spent,
> > + struct settings *settings,
> > + char **abortreason)
> > {
> > fd_set set;
> > char buf[2048];
> > @@ -1116,6 +1117,7 @@ static int monitor_output(pid_t child,
> > if (status == IGT_EXIT_ABORT) {
> > errf("Test exited with IGT_EXIT_ABORT, aborting.\n");
> > aborting = true;
> > + *abortreason = strdup("Test exited with IGT_EXIT_ABORT");
> > }
> >
> > if (time_spent)
> > @@ -1142,6 +1144,7 @@ static int monitor_output(pid_t child,
> > if (kill(child, 0) && errno == ESRCH)
> > errf("The test process no longer exists, "
> > "but we didn't get informed of its demise...\n");
> > + asprintf(abortreason, "Child refuses to die, tainted 0x%lx.", taints);
> > }
> >
> > dump_dmesg(kmsgfd, outputs[_F_DMESG]);
> > @@ -1309,7 +1312,8 @@ static int execute_next_entry(struct execute_state *state,
> > struct settings *settings,
> > struct job_list_entry *entry,
> > int testdirfd, int resdirfd,
> > - int sigfd, sigset_t *sigmask)
> > + int sigfd, sigset_t *sigmask,
> > + char **abortreason)
> > {
> > int dirfd;
> > int outputs[_F_LAST];
> > @@ -1406,7 +1410,7 @@ static int execute_next_entry(struct execute_state *state,
> > outpipe[1] = errpipe[1] = -1;
> >
> > result = monitor_output(child, outfd, errfd, kmsgfd, sigfd,
> > - outputs, time_spent, settings);
> > + outputs, time_spent, settings, abortreason);
> >
> > out_kmsgfd:
> > close(kmsgfd);
> > @@ -1783,7 +1787,7 @@ bool execute(struct execute_state *state,
> >
> > for (; state->next < job_list->size;
> > state->next++) {
> > - char *reason;
> > + char *reason = NULL;
> > int result;
> >
> > if (should_die_because_signal(sigfd)) {
> > @@ -1797,7 +1801,21 @@ bool execute(struct execute_state *state,
> > settings,
> > &job_list->entries[state->next],
> > testdirfd, resdirfd,
> > - sigfd, &sigmask);
> > + sigfd, &sigmask,
> > + &reason);
> > +
> > + if (reason != NULL || (reason = need_to_abort(settings)) != NULL) {
> > + char *prev = entry_display_name(&job_list->entries[state->next]);
> > + char *next = (state->next + 1 < job_list->size ?
> > + entry_display_name(&job_list->entries[state->next + 1]) :
> > + strdup("nothing"));
> > + write_abort_file(resdirfd, reason, prev, next);
> > + free(prev);
> > + free(next);
> > + free(reason);
> > + status = false;
> > + break;
> > + }
>
> Ok, so this now checks for the termination before checking for the
> overall timeout. That seems fine.
>
> The question is whether that should be flagged as an igt at runner@aborted
> as well since that is also an abnormal condition preventing the
> execution of later tests.
overall-timeout is a "soft" stop in the sense that the expected use
case is run-until-overall-timeout, then invoke igt_resume. Rinse and
repeat until done. Abnormal condition only until you resume.
A result is coming for this, but it won't be igt at runner@aborted, it
will be under another name. The main difference being that resuming an
execution that overall-timeouted must remove that data, because it
will be used for "hey you, you generated result pages for an execution
that wasn't fully done yet," while @aborted is used for "hey at least
one test in this set did the nasties" and that data must stay no
matter what.
--
Petri Latvala
More information about the igt-dev
mailing list