[PATCH i-g-t v6] runner/executor: Abort when child process is killed by a signal
Peter Senna Tschudin
peter.senna at linux.intel.com
Thu Oct 3 13:27:37 UTC 2024
Manually killing a test process results in igt-runner silently marking the
test as incomplete. Change the behavior to abort verbosely when a test is
killed.
In order for the new behavior to work, child termination is probed on
every iteration of the while loop inside monitor_output(). Add the
bool test_timed_out to track when igt_runner is intentionally terminating
a test, and do not interfere with it.
Tested by:
- using the --per-test-timeout flag and checking that results.json labels
the test as a timeout.
- manually killing a test process and checking that results.json labels
the test as an abort with a message stating the test was killed.
v6: - clear code comments
- move child termination probing closer to where sigfd is handled
- add a bool test_timed_out to ensure we will not interfere with
igt_runner intentionally killing tests.
- move the aborting code to outside the while loop
v5: do not use sigdescr_np() as it seems to be a fairly new lib function
that does not compile on older Ubuntu
v4: improve abort code path to not interfere with igt-runner timeouts
v3: do not interfere with igt-runner killing tests due to timeout and
diskspace
v2: fix race condition
Cc: Petri Latvala <adrinael at adrinael.net>
Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
Signed-off-by: Peter Senna Tschudin <peter.senna at linux.intel.com>
---
runner/executor.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)
diff --git a/runner/executor.c b/runner/executor.c
index ac73e1dde..69b4ed939 100644
--- a/runner/executor.c
+++ b/runner/executor.c
@@ -888,12 +888,15 @@ static int monitor_output(pid_t child,
const int interval_length = 1;
int wd_timeout;
int killed = 0; /* 0 if not killed, signal number otherwise */
+ bool child_reaped = false;
+ bool child_killed_by_signal = false;
struct timespec time_beg, time_now, time_last_activity, time_last_subtest, time_killed;
unsigned long taints = 0;
bool aborting = false;
size_t disk_usage = 0;
bool socket_comms_used = false; /* whether the test actually uses comms */
bool results_received = false; /* whether we already have test results that might need overriding if we detect an abort condition */
+ bool test_timed_out = false;
igt_gettime(&time_beg);
time_last_activity = time_last_subtest = time_killed = time_beg;
@@ -1233,6 +1236,14 @@ static int monitor_output(pid_t child,
}
}
+ /* Always check for abort conditions */
+ if (child == waitpid(child, &status, WNOHANG)) {
+ child_reaped = true;
+ if (WIFSIGNALED(status)) {
+ child_killed_by_signal = true;
+ killed = WTERMSIG(status);
+ }
+ }
if (sigfd >= 0 && FD_ISSET(sigfd, &set)) {
double time;
@@ -1241,7 +1252,12 @@ static int monitor_output(pid_t child,
errf("Error reading from signalfd: %m\n");
continue;
} else if (siginfo.ssi_signo == SIGCHLD) {
- if (child != waitpid(child, &status, WNOHANG)) {
+ if (!child_reaped) {
+ /* Was child killed since we last checked? */
+ if (child == waitpid(child, &status, WNOHANG))
+ child_reaped = true;
+ }
+ if (!child_reaped) {
errf("Failed to reap child\n");
status = 9999;
} else if (WIFEXITED(status)) {
@@ -1303,7 +1319,6 @@ static int monitor_output(pid_t child,
fdatasync(outputs[_F_JOURNAL]);
}
}
-
aborting = true;
killed = SIGQUIT;
if (!kill_child(killed, child))
@@ -1447,6 +1462,7 @@ static int monitor_output(pid_t child,
disk_usage);
if (timeout_reason) {
+ test_timed_out = true;
if (killed == SIGKILL) {
/* Nothing that can be done, really. Let's tell the caller we want to abort. */
@@ -1485,6 +1501,15 @@ static int monitor_output(pid_t child,
}
}
+ if (!test_timed_out && child_killed_by_signal) {
+ sprintf(buf, "Test terminated by a signal %s (%d).\n",
+ strsignal(killed), -killed);
+ errf("%s", buf);
+
+ *abortreason = strdup(buf);
+ aborting = true;
+ }
+
dump_dmesg(kmsgfd, outputs[_F_DMESG]);
if (settings->sync)
fdatasync(outputs[_F_DMESG]);
--
2.34.1
More information about the igt-dev
mailing list