[PATCH i-g-t v5] runner/executor: Detect when child process is killed by a signal
Kamil Konieczny
kamil.konieczny at linux.intel.com
Tue Sep 10 15:58:14 UTC 2024
Hi Peter,
On 2024-09-03 at 14:05:05 +0200, Peter Senna Tschudin wrote:
> Make igt-runner aware about tests being killed by signals. Before this
> patch, manually killing a test process would result in igt-runner silently
> marking the test as incomplete.
>
> Now igt-runner aborts the run verbosely. As an example the following was
> extracted from results.json:
> This test caused an abort condition: Test terminated by a signal Killed (-9).
>
> v5: do not use sigdescr_np() as it seems to be a fairly new lib function that does
> not compile on older Ubuntu
> v4: improve abort code path to not interfere with igt-runner timeouts
> v3: do not interfere with igt-runner killing tests due to timeout and diskspace
> v2: fix race condition
>
> Cc: Petri Latvala <adrinael at adrinael.net>
> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> Signed-off-by: Peter Senna Tschudin <peter.senna at intel.com>
Your signed-off-by is different from your e-mail:
Peter Senna Tschudin <peter.senna at linux.intel.com>
checkpatch.pl script warns:
100: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Peter Senna Tschudin <peter.senna at linux.intel.com>' != 'Signed-off-by: Peter Senna Tschudin <peter.senna at intel.com>'
> ---
> runner/executor.c | 38 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/runner/executor.c b/runner/executor.c
> index ac73e1dde..990d932f3 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -888,6 +888,8 @@ static int monitor_output(pid_t child,
> const int interval_length = 1;
> int wd_timeout;
> int killed = 0; /* 0 if not killed, signal number otherwise */
> + bool child_reaped = false;
> + bool child_killed_by_signal = false;
> struct timespec time_beg, time_now, time_last_activity, time_last_subtest, time_killed;
> unsigned long taints = 0;
> bool aborting = false;
> @@ -960,6 +962,25 @@ static int monitor_output(pid_t child,
>
> igt_gettime(&time_now);
>
> + /* Testing for !killed to prevent aborting too early after igt-runner
> + * decides to kill a process.
> + */
> + if (!killed && (child == waitpid(child, &status, WNOHANG))) {
> + child_reaped = true;
> + if (WIFSIGNALED(status)) {
> + child_killed_by_signal = true;
> + killed = WTERMSIG(status);
> +
> + /*
> + * Do not abort just yet, because igt-runner can kill the test
> + * due to a timeout for example. Aborting here prevents
> + * igt-runner from reporting a timeout. The code that aborts
> + * the run after the test was killed is at the end of the
> + * while() loop.
> + */
> + }
> + }
> +
Why is it here? Imho better place it at second hunk.
> /* TODO: Refactor these handlers to their own functions */
> if (outfd >= 0 && FD_ISSET(outfd, &set)) {
> char *newline;
> @@ -1241,7 +1262,11 @@ static int monitor_output(pid_t child,
> errf("Error reading from signalfd: %m\n");
> continue;
> } else if (siginfo.ssi_signo == SIGCHLD) {
> - if (child != waitpid(child, &status, WNOHANG)) {
This code here do what you wrote at first hunk:
if (child != waitpid(child, &status, WNOHANG)) {
// fatal err, set status=9999
} else { // all 'else if' checks 'child == '
}
The only difference is that your code calls waitpid() unconditionally
before loop. Could you write some testing code in one simple test
and make checks on trybot? e.g. to not rely that your code will
catch such scenarios if they are rarly seen but to test them in
predictable way. You could also add here some debugs.
Regards,
Kamil
> + if (!child_reaped) {
> + if (child == waitpid(child, &status, WNOHANG))
> + child_reaped = true;
> + }
> + if (!child_reaped) {
> errf("Failed to reap child\n");
> status = 9999;
> } else if (WIFEXITED(status)) {
> @@ -1483,6 +1508,17 @@ static int monitor_output(pid_t child,
> return -1;
> time_killed = time_now;
> }
> +
> + if (child_killed_by_signal) {
> + aborting = true;
> +
> + sprintf(buf, "Test terminated by a signal %s (%d).\n",
> + strsignal(killed), -killed);
> + errf("%s", buf);
> + *abortreason = strdup(buf);
> +
> + break;
> + }
> }
>
> dump_dmesg(kmsgfd, outputs[_F_DMESG]);
> --
> 2.34.1
>
More information about the igt-dev
mailing list