[PATCH i-g-t] tests/intel/xe_fault_injection: Break early if the test is done

Thu Aug 7 06:14:58 UTC 2025

On 8/4/2025 12:15 PM, Daniele Ceraolo Spurio wrote:
> For the GuC communication injections, the test is currently looping for
> an hardcoded 100 times. In the current driver we call the 2 GuC
> communication functions way less than 100 times each (in one run I've
> counted less than 40 for one and less than 10 for the other), so for
> the majority of the iterations the test is just reloading the driver
> without any injection occurring.
>
> To avoid wasting so much time, we can just break early if we've reached
> the point where no more injections are occurring. If we use a positive
> value for the "times" parameter in the injection framework, the value
> will be decremented every time an injection occur in the given
> iteration, so we can break once we detect that the value is unchanged.
>
> While at it, add an assert to alert us if the number of calls to the
> function increases above our current maximum number of loops, so we know
> that the test needs to be updated.
>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> ---
>   tests/intel/xe_fault_injection.c | 37 +++++++++++++++++++++++++++++---
>   1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> index 8245c558c..00d045cd8 100644
> --- a/tests/intel/xe_fault_injection.c
> +++ b/tests/intel/xe_fault_injection.c
> @@ -27,7 +27,8 @@
>   #define INJECT_ERRNO	-ENOMEM
>   #define BO_ADDR		0x1a0000
>   #define BO_SIZE		(1024*1024)
> -#define INJECT_ITERATIONS	100
> +#define MAX_INJECT_ITERATIONS	100
> +#define MAX_INJECTIONS_PER_ITER	100
>   
>   int32_t inject_iters_raw;
>   struct fault_injection_params {
> @@ -194,6 +195,19 @@ static void cleanup_injection_fault(int sig)
>   	injection_list_clear();
>   }
>   
> +static int get_remaining_injection_count(void)
> +{
> +	int dir, val;
> +
> +	dir = fail_function_open();
> +	igt_assert_lte(0, dir);
> +
> +	val = igt_sysfs_get_s32(dir, "times");
> +
> +	close(dir);
> +	return val;
> +}
> +
>   static void set_retval(const char function_name[], long long retval)
>   {
>   	char path[96];
> @@ -288,14 +302,31 @@ static void probe_fail_guc(int fd, const char pci_slot[], const char function_na
>   	*/
>   	iter = inject_iters_raw;
>   	iter_start = iter ? : 0;
> -	iter_end = iter ? iter + 1 : INJECT_ITERATIONS;
> +	iter_end = iter ? iter + 1 : MAX_INJECT_ITERATIONS;
>   	igt_debug("Injecting error for %d - %d iterations\n", iter_start, iter_end);
>   	for (int i = iter_start; i < iter_end; i++) {
>   		fault_params->space = i;
> +		fault_params->times = MAX_INJECTIONS_PER_ITER;
>   		setup_injection_fault(fault_params);
>   		inject_fault_probe(fd, pci_slot, function_name);
>   		igt_kmod_unbind("xe", pci_slot);
> +
> +		/*
> +		 * if no injection occurred we've tested all the injection
> +		 * points for this function and can therefore stop iterating.
> +		 */
> +		if (get_remaining_injection_count() == MAX_INJECTIONS_PER_ITER)
> +			break;
What about the case where remaining comes back as zero? That would mean 
that there were more than MAX_INJECTIONS_PER_ITER in the current pass so 
we not testing every failure point we otherwise would have. Or are we 
not worried about that?

Basically, we are testing a rolling window of function instance (i) -> 
(i + MAX_INJECTIONS_PER_ITER) where i varies from 0 -> 
MAX_INJECT_ITERATIONS. If both limits are greater than the actual 
instance count, then that becomes simply (i) -> (end) but if the per 
iteration count is less then it is a rolling sub window.

If the intent is to always test from (i) to (end), then there should be 
an assert that remaining does not come back as zero. Or better, the two 
defines should just be a single define called 'MAX_INSTANCE_COUNT' or 
some such. Because the number of iterations and number of injections per 
iteration are really just 'how many times is this function called'. The 
iteration aspect is simply the test mechanism rather than the test 
purpose. The test really being "cause a failure of every call of 
function X from the first to the last".

On the other hand, if we think there is value in testing the rolling 
window case - transient fsilures rather than permanent failures. Then 
the MAX_PER_ITER should be a lot smaller (e.g. 10 or even 1), to ensure 
it does not hit every instance.

Broader question beyond this patch - is the plan to enable iterative 
testing of other functions? It seems odd to test every possible failure 
point of the GuC send/recv functions but only ever test the first 
instance of every other injection point. Seems like everything should do 
the looping.

John.

>   	}
> +
> +	/*
> +	 * In the unlikely case where we haven't covered all the injection
> +	 * points for the function (because there are more of them than
> +	 * MAX_INJECT_ITERATIONS) fail the test so that we know we need to do an
> +	 * update and/or split it in two parts.
> +	 */
> +	igt_assert_f(inject_iters_raw || iter != MAX_INJECT_ITERATIONS,
> +		     "Loop exited without covering all injection points!\n");
>   }
>   
>   /**
> @@ -486,7 +517,7 @@ static int opt_handler(int opt, int opt_index, void *data)
>   	case 'I':
>   		/* Update to 0 if not exported / -ve value */
>   		in_param = atoi(optarg);
> -		if (!in_param || in_param <= 0 || in_param > INJECT_ITERATIONS)
> +		if (!in_param || in_param <= 0 || in_param > MAX_INJECT_ITERATIONS)
>   			inject_iters_raw = 0;
>   		else
>   			inject_iters_raw = in_param;