[PATCH i-g-t v6] runner/executor: Abort when child process is killed by a signal
Kamil Konieczny
kamil.konieczny at linux.intel.com
Fri Oct 11 15:18:09 UTC 2024
Hi Peter,
On 2024-10-03 at 15:27:37 +0200, Peter Senna Tschudin wrote:
> Manually killing a test process results in igt-runner silently marking the
> test as incomplete. Change the behavior to abort verbosely when a test is
> killed.
This is how it was designed, test ends with one of SUCCESS/SKIP/FAIL and
then runner marks it as completed. Missing this info means test was
incomplete.
>
> In order for the new behavior to work, child termination is probed on
> every iteration of the while loop inside monitor_output(). Add the
> bool test_timed_out to track when igt_runner is intentionally terminating
> a test, and do not interfere with it.
>
> Tested by:
> - using the --per-test-timeout flag and checking that results.json labels
> the test as a timeout.
> - manually killing a test process and checking that results.json labels
> the test as an abort with a message stating the test was killed.
We strive to have tests as programs/helpers/scripts so they can be
repeatedly tested by CI. If you want to have your changes accepted
you should write it.
Almost all signals are catched and reported by tests itself. A missing
one is SIGKILL which only runner (parent process) can report.
Regards,
Kamil
>
> v6: - clear code comments
> - move child termination probing closer to where sigfd is handled
> - add a bool test_timed_out to ensure we will not interfere with
> igt_runner intentionally killing tests.
> - move the aborting code to outside the while loop
>
> v5: do not use sigdescr_np() as it seems to be a fairly new lib function
> that does not compile on older Ubuntu
> v4: improve abort code path to not interfere with igt-runner timeouts
> v3: do not interfere with igt-runner killing tests due to timeout and
> diskspace
> v2: fix race condition
>
> Cc: Petri Latvala <adrinael at adrinael.net>
> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> Signed-off-by: Peter Senna Tschudin <peter.senna at linux.intel.com>
> ---
> runner/executor.c | 29 +++++++++++++++++++++++++++--
> 1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/runner/executor.c b/runner/executor.c
> index ac73e1dde..69b4ed939 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -888,12 +888,15 @@ static int monitor_output(pid_t child,
> const int interval_length = 1;
> int wd_timeout;
> int killed = 0; /* 0 if not killed, signal number otherwise */
> + bool child_reaped = false;
> + bool child_killed_by_signal = false;
> struct timespec time_beg, time_now, time_last_activity, time_last_subtest, time_killed;
> unsigned long taints = 0;
> bool aborting = false;
> size_t disk_usage = 0;
> bool socket_comms_used = false; /* whether the test actually uses comms */
> bool results_received = false; /* whether we already have test results that might need overriding if we detect an abort condition */
> + bool test_timed_out = false;
>
> igt_gettime(&time_beg);
> time_last_activity = time_last_subtest = time_killed = time_beg;
> @@ -1233,6 +1236,14 @@ static int monitor_output(pid_t child,
> }
> }
>
> + /* Always check for abort conditions */
> + if (child == waitpid(child, &status, WNOHANG)) {
> + child_reaped = true;
> + if (WIFSIGNALED(status)) {
> + child_killed_by_signal = true;
> + killed = WTERMSIG(status);
> + }
> + }
> if (sigfd >= 0 && FD_ISSET(sigfd, &set)) {
> double time;
>
> @@ -1241,7 +1252,12 @@ static int monitor_output(pid_t child,
> errf("Error reading from signalfd: %m\n");
> continue;
> } else if (siginfo.ssi_signo == SIGCHLD) {
> - if (child != waitpid(child, &status, WNOHANG)) {
> + if (!child_reaped) {
> + /* Was child killed since we last checked? */
> + if (child == waitpid(child, &status, WNOHANG))
> + child_reaped = true;
> + }
> + if (!child_reaped) {
> errf("Failed to reap child\n");
> status = 9999;
> } else if (WIFEXITED(status)) {
> @@ -1303,7 +1319,6 @@ static int monitor_output(pid_t child,
> fdatasync(outputs[_F_JOURNAL]);
> }
> }
> -
> aborting = true;
> killed = SIGQUIT;
> if (!kill_child(killed, child))
> @@ -1447,6 +1462,7 @@ static int monitor_output(pid_t child,
> disk_usage);
>
> if (timeout_reason) {
> + test_timed_out = true;
> if (killed == SIGKILL) {
> /* Nothing that can be done, really. Let's tell the caller we want to abort. */
>
> @@ -1485,6 +1501,15 @@ static int monitor_output(pid_t child,
> }
> }
>
> + if (!test_timed_out && child_killed_by_signal) {
> + sprintf(buf, "Test terminated by a signal %s (%d).\n",
> + strsignal(killed), -killed);
> + errf("%s", buf);
> +
> + *abortreason = strdup(buf);
> + aborting = true;
> + }
> +
> dump_dmesg(kmsgfd, outputs[_F_DMESG]);
> if (settings->sync)
> fdatasync(outputs[_F_DMESG]);
> --
> 2.34.1
>
More information about the igt-dev
mailing list