[PATCH i-g-t v2] runner: Relax timeout reduction on soft lockup
Janusz Krzysztofik
janusz.krzysztofik at linux.intel.com
Tue Jul 8 13:04:15 UTC 2025
In case of soft lockups, it might be helpful from root cause analysis
perspective to see if the test was still able to complete despite
triggering the soft lockup warning, or if that soft lockup seems not
recoverable without killing the test. For that to be possible, igt_runner
should not kill the test too promptly if a soft lockup related kernel
taint is detected.
On kernel taints, igt_runner now decreases per test and inactivity
timeouts by a factor of 10. Let it check if the taint is caused by a
soft lockup and decrease the timeouts only by the factor of 2 in those
cases.
v2: Define symbols for taint bits and use them (Kamil)
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
---
lib/igt_taints.c | 8 ++++----
lib/igt_taints.h | 6 ++++++
runner/executor.c | 14 ++++++++++----
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/lib/igt_taints.c b/lib/igt_taints.c
index 6b36d11cba..1d238fd2af 100644
--- a/lib/igt_taints.c
+++ b/lib/igt_taints.c
@@ -13,10 +13,10 @@ static const struct {
int bad;
const char *explanation;
} abort_taints[] = {
- { 4, 1, "TAINT_MACHINE_CHECK: Processor reported a Machine Check Exception."},
- { 5, 1, "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags." },
- { 7, 1, "TAINT_DIE: Kernel has died - BUG/OOPS." },
- { 9, 1, "TAINT_WARN: WARN_ON has happened." },
+ { TAINT_MACHINE_CHECK, 1, "TAINT_MACHINE_CHECK: Processor reported a Machine Check Exception."},
+ { TAINT_BAD_PAGE, 1, "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags." },
+ { TAINT_DIE, 1, "TAINT_DIE: Kernel has died - BUG/OOPS." },
+ { TAINT_WARN, 1, "TAINT_WARN: WARN_ON has happened." },
{ -1 }
};
diff --git a/lib/igt_taints.h b/lib/igt_taints.h
index be4195c5aa..50c4cf16f8 100644
--- a/lib/igt_taints.h
+++ b/lib/igt_taints.h
@@ -6,6 +6,12 @@
#ifndef __IGT_TAINTS_H__
#define __IGT_TAINTS_H__
+#define TAINT_MACHINE_CHECK 4
+#define TAINT_BAD_PAGE 5
+#define TAINT_DIE 7
+#define TAINT_WARN 9
+#define TAINT_SOFT_LOCKUP 14
+
unsigned long igt_kernel_tainted(unsigned long *taints);
const char *igt_explain_taints(unsigned long *taints);
diff --git a/runner/executor.c b/runner/executor.c
index 13180a0a46..847abe481a 100644
--- a/runner/executor.c
+++ b/runner/executor.c
@@ -871,10 +871,15 @@ static const char *need_to_timeout(struct settings *settings,
if (settings->abort_mask & ABORT_TAINT &&
is_tainted(taints)) {
/* list of timeouts that may postpone immediate kill on taint */
- if (settings->per_test_timeout || settings->inactivity_timeout)
- decrease = 10;
- else
+ if (settings->per_test_timeout || settings->inactivity_timeout) {
+ if (is_tainted(taints) == (1 << TAINT_WARN) &&
+ taints & (1 << TAINT_SOFT_LOCKUP))
+ decrease = 2;
+ else
+ decrease = 10;
+ } else {
return "Killing the test because the kernel is tainted.\n";
+ }
}
if (settings->per_test_timeout != 0 &&
@@ -1526,8 +1531,9 @@ static int monitor_output(pid_t child,
sigfd = -1; /* we are dying, no signal handling for now */
}
+ igt_kernel_tainted(&taints);
timeout_reason = need_to_timeout(settings, killed,
- igt_kernel_tainted(&taints),
+ taints,
igt_time_elapsed(&time_last_activity, &time_now),
igt_time_elapsed(&time_last_subtest, &time_now),
igt_time_elapsed(&time_killed, &time_now),
--
2.50.0
More information about the Intel-xe
mailing list