[PATCH i-g-t v5] runner/executor: Detect when child process is killed by a signal

Kamil Konieczny kamil.konieczny at linux.intel.com
Wed Sep 11 10:50:12 UTC 2024


Hi Peter,
On 2024-09-11 at 09:06:12 +0200, Peter Senna Tschudin wrote:
> Hi Kamil,
> 
> Thank you for your review.
> 
> On 10.09.2024 17:58, Kamil Konieczny wrote:
> > Hi Peter,
> > On 2024-09-03 at 14:05:05 +0200, Peter Senna Tschudin wrote:
> >> Make igt-runner aware about tests being killed by signals. Before this
> >> patch, manually killing a test process would result in igt-runner silently
> >> marking the test as incomplete.
> >>
> >> Now igt-runner aborts the run verbosely. As an example the following was
> >> extracted from results.json:
> >>  This test caused an abort condition: Test terminated by a signal Killed (-9).
> >>
> >>  v5: do not use sigdescr_np() as it seems to be a fairly new lib function that does
> >>      not compile on older Ubuntu
> >>  v4: improve abort code path to not interfere with igt-runner timeouts
> >>  v3: do not interfere with igt-runner killing tests due to timeout and diskspace
> >>  v2: fix race condition
> >>
> >> Cc: Petri Latvala <adrinael at adrinael.net>
> >> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> >> Signed-off-by: Peter Senna Tschudin <peter.senna at intel.com>
> > 
> > Your signed-off-by is different from your e-mail:
> > Peter Senna Tschudin <peter.senna at linux.intel.com>
> 
> That was a feature, not a bug.
> 

You can have it just place "From: you singned-off-by signature" at begin
of the patch, for example see here:

https://patchwork.freedesktop.org/patch/612712/?series=138273&rev=1

> > 
> > checkpatch.pl script warns:
> > 
> > 100: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Peter Senna Tschudin <peter.senna at linux.intel.com>' != 'Signed-off-by: Peter Senna Tschudin <peter.senna at intel.com>'
> 
> Will fix and not repeat.
> 

Check with checkpatch.pl script before sending.

> > 
> >> ---
> >>  runner/executor.c | 38 +++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 37 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/runner/executor.c b/runner/executor.c
> >> index ac73e1dde..990d932f3 100644
> >> --- a/runner/executor.c
> >> +++ b/runner/executor.c
> >> @@ -888,6 +888,8 @@ static int monitor_output(pid_t child,
> >>  	const int interval_length = 1;
> >>  	int wd_timeout;
> >>  	int killed = 0; /* 0 if not killed, signal number otherwise */
> >> +	bool child_reaped = false;
> >> +	bool child_killed_by_signal = false;
> >>  	struct timespec time_beg, time_now, time_last_activity, time_last_subtest, time_killed;
> >>  	unsigned long taints = 0;
> >>  	bool aborting = false;
> >> @@ -960,6 +962,25 @@ static int monitor_output(pid_t child,
> >>  
> >>  		igt_gettime(&time_now);
> >>  
> >> +		/* Testing for !killed to prevent aborting too early after igt-runner
> >> +		 * decides to kill a process.
> >> +		 */
> >> +		if (!killed && (child == waitpid(child, &status, WNOHANG))) {
> >> +			child_reaped = true;
> >> +			if (WIFSIGNALED(status)) {
> >> +				child_killed_by_signal = true;
> >> +				killed = WTERMSIG(status);
> >> +
> >> +				/*
> >> +				 * Do not abort just yet, because igt-runner can kill the test
> >> +				 * due to a timeout for example. Aborting here prevents
> >> +				 * igt-runner from reporting a timeout. The code that aborts
> >> +				 * the run after the test was killed is at the end of the
> >> +				 * while() loop.
> >> +				 */
> >> +			}
> >> +		}
> >> +
> > 
> > Why is it here? Imho better place it at second hunk.
> 
> Please notice that the problem is that igt-runner does not check if a child ended due to a signal,
> but it notices when a child ends. The problem with your suggestion is that the loop _may_ decide
> to exercise the old behavior and simply mark the test as incomplete.
> 
> My proposal is to always check for child termination due to a signal, and hence unconditional
> detection before all other logic inside the loop. However there are corner cases, such as
> excessive disk space consumption by a test, in which igt-runner will signal a test, and will
> check for the child being signaled. To not interfere with these corner cases, the detection of
> a signal and taking the action of aborting the run are at different places.

Could you place that detection _after_ a second hunk, maybe just before
loop end? That way you could decide to end a loop there?

Also I am not convinced we should abort runner, thats why I asked
for some predictable tests, possibly with some adjusted test which
will simulate a problem.

> 
> > 
> >>  		/* TODO: Refactor these handlers to their own functions */
> >>  		if (outfd >= 0 && FD_ISSET(outfd, &set)) {
> >>  			char *newline;
> >> @@ -1241,7 +1262,11 @@ static int monitor_output(pid_t child,
> >>  				errf("Error reading from signalfd: %m\n");
> >>  				continue;
> >>  			} else if (siginfo.ssi_signo == SIGCHLD) {
> >> -				if (child != waitpid(child, &status, WNOHANG)) {
> > 
> > This code here do what you wrote at first hunk:
> > 	if (child != waitpid(child, &status, WNOHANG)) {
> > 		// fatal err, set status=9999
> > 	} else { // all 'else if' checks 'child == '
> > 	}
> > 
> > The only difference is that your code calls waitpid() unconditionally
> > before loop. Could you write some testing code in one simple test
> 
> Exactly. Calling waitpid() unconditionally, and early enough to catch all signals.
> 

imho better to do this _after_ siginfo checks

> > and make checks on trybot? e.g. to not rely that your code will
> > catch such scenarios if they are rarly seen but to test them in
> > predictable way. You could also add here some debugs.
> 
> I did not understand what you want to test. The "condition" here is igt-runner itself killing a test due to excessive disk space consumption for example.
> 
> Please let me know how to proceed.

One idea would be to write a HAX test to simulate error conditions
which you are trying to catch.

Regards,
Kamil

> 
> Thank you!
> 
> [...]


More information about the igt-dev mailing list