[igt-dev] [PATCH i-g-t 2/2] runner: Don't wait forever for processes to die

Petri Latvala petri.latvala at intel.com
Tue Dec 3 10:44:57 UTC 2019


While the originally written timeout for process killing (2 seconds)
was way too short, waiting indefinitely is suboptimal as well. We're
seeing cases where the test is stuck for possibly hours in
uninterruptible sleep (IO). Wait a fairly longer selected time period
of 2 minutes, because even making progress for that long means the
machine is in bad enough state to require a good kicking and booting.

Signed-off-by: Petri Latvala <petri.latvala at intel.com>
Cc: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
---
 runner/executor.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/runner/executor.c b/runner/executor.c
index e6086772..b58c98e6 100644
--- a/runner/executor.c
+++ b/runner/executor.c
@@ -776,25 +776,29 @@ static int monitor_output(pid_t child,
 				if (!kill_child(killed, child))
 					return -1;
 
-				intervals_left = timeout_intervals = 1;
-				break;
-			case SIGKILL:
 				/*
-				 * If the child still exists, and the kernel
-				 * hasn't oopsed, assume it is still making
-				 * forward progress towards exiting (i.e. still
-				 * freeing all of its resources).
+				 * Allow the test two minutes to die
+				 * on SIGKILL. If it takes more than
+				 * that, we're quite likely in a
+				 * scenario where we want to reboot
+				 * the machine anyway.
 				 */
-				if (kill(child, 0) == 0 && !tainted(&taints)) {
-					intervals_left =  1;
-					break;
-				}
-
+				watchdogs_set_timeout(120);
+				timeout = 20;
+				intervals_left = timeout_intervals = 120 / timeout;
+				break;
+			case SIGKILL:
 				/* Nothing that can be done, really. Let's tell the caller we want to abort. */
+
 				if (settings->log_level >= LOG_LEVEL_NORMAL) {
+					tainted(&taints);
 					errf("Child refuses to die, tainted %lx. Aborting.\n",
 					     taints);
+					if (kill(child, 0) == 0)
+						errf("The test process no longer exists, "
+						     "but we didn't get informed of its demise...\n");
 				}
+
 				close_watchdogs(settings);
 				free(outbuf);
 				close(outfd);
-- 
2.19.1



More information about the igt-dev mailing list