[PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT dumps during fault injection

Tue Jun 10 15:47:58 UTC 2025

-----Original Message-----
From: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com> 
Sent: Tuesday, June 10, 2025 8:22 AM
To: igt-dev at lists.freedesktop.org
Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Harrison, John C <john.c.harrison at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>; Dugast, Francois <francois.dugast at intel.com>
Subject: [PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT dumps during fault injection
> 
> When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
> functions, the CI test systems are going out of space and crashing. To
> avoid this issue, a new helper function is created and when fault is injected
> into this xe_should_fail_ct_dead_capture() helper function, ct dead capture is
> avoided which suppresses ct dumps in the log.
> 
> Inject fault into xe_should_fail_ct_dead_capture() function along with
> xe_guc_ct_send_recv() & xe_guc_mmio_send_recv() to suppress GUC ct dumps.
> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> Suggested-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> Cc: Francois Dugast <francois.dugast at intel.com>
> ---
> Same as https://patchwork.freedesktop.org/series/148416/ which was
> reverted due to change from XE was still in review.
> 
> Test-with: 20250524144613.11970-1-satyanarayana.k.v.p at intel.com
> ---
>  tests/intel/xe_fault_injection.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> index 9fe6bfe35..bbc8e71d6 100644
> --- a/tests/intel/xe_fault_injection.c
> +++ b/tests/intel/xe_fault_injection.c
> @@ -123,6 +123,22 @@ static void injection_list_add(const char function_name[])
>  	close(dir);
>  }
>  
> +static void injection_list_append(const char function_name[])
> +{
> +	int dir, fd, ret;
> +
> +	dir = fail_function_open();
> +	igt_assert_lte(0, dir);
> +
> +	fd = openat(dir, "inject", O_WRONLY | O_APPEND);
> +	ret = write(fd, function_name, strlen(function_name));

On one hand, we should probably guard against openat failing by implementing
some error catching here, I.E.:

"""
	fd = openat(dir, "inject", O_WRONLY | O_APPEND);
	igt_assert_lte(0, fd);
	ret = write(fd, function_name, strlen(function_name));
"""

On the other hand, if we fail to open the file with openat and attempt to pass
the broken fd to write, then the write function would end up failing instead,
so we'd end up catching the error later in this function either way.  So, I
suppose it's not strictly necessary to implement a guard here, though I think
it would be preferred for stylistic reasons.

> +
> +	close(fd);
> +	close(dir);
> +
> +	igt_assert_lte(0, ret);
> +}
> +
>  static void injection_list_remove(const char function_name[])
>  {
>  	int dir;
> @@ -202,6 +218,15 @@ static void set_retval(const char function_name[], long long retval)
>  	close(dir);
>  }
>  
> +static void ignore_fail_dump_in_dmesg(const char function_name[])
> +{
> +	if (!strcmp(function_name, "xe_guc_ct_send_recv") ||
> +	    !strcmp(function_name, "xe_guc_mmio_send_recv")) {

I think this if statement can be shortened.  Try:

"""
	if (!strstr(function_name, "send_recv")) {
		injection_list_append("xe_is_injection_active");
		set_retval("xe_is_injection_active", INJECT_ERRNO);
	}
"""

> +		injection_list_append("xe_is_injection_active");
> +		set_retval("xe_is_injection_active", INJECT_ERRNO);
> +	}
> +}
> +
>  /**
>   * SUBTEST: inject-fault-probe-function-%s
>   * Description: inject an error in the injectable function %arg[1] then
> @@ -236,6 +261,7 @@ inject_fault_probe(int fd, char pci_slot[], const char function_name[])
>  
>  	ignore_faults_in_dmesg(function_name);
>  	injection_list_add(function_name);
> +	ignore_fail_dump_in_dmesg(function_name);
>  	set_retval(function_name, INJECT_ERRNO);
>  
>  	igt_kmod_bind("xe", pci_slot);
> @@ -273,6 +299,8 @@ static void probe_fail_guc(int fd, char pci_slot[], const char function_name[],
>  	for (int i = iter_start; i < iter_end; i++) {
>  		fault_params->space = i;
>  		setup_injection_fault(fault_params);
> +		/* Clear the injection list  */
> +		injection_list_clear();

Hmm... I think it might be better to perform a targeted clear, instead of just
wiping the list clean after each run.  To do this, maybe we can have
ignore_fail_dump_in_dmesg return some value.  Perhaps:

"""
static bool ignore_fail_dump_in_dmesg(const char function_name[])
{
	return !strstr(function_name, "send_recv");
}
"""

Then, in inject_fault_probe:

"""
static int
inject_fault_probe(int fd, const char pci_slot[], const char function_name[])
{
	int err = 0;
	bool ignore_fail_dump = ignore_fail_dump_in_dmesg(function_name);
	igt_info("Injecting error \"%s\" (%d) in function \"%s\"\n",
		 strerror(-INJECT_ERRNO), INJECT_ERRNO, function_name);

	ignore_dmesg_errors_from_dut(pci_slot);
	injection_list_add(function_name);
	if (ignore_fail_dump) {
		injection_list_append("xe_is_injection_active");
		set_retval("xe_is_injection_active", INJECT_ERRNO);
	}

	set_retval(function_name, INJECT_ERRNO);

	igt_kmod_bind("xe", pci_slot);

	err = -errno;
	injection_list_remove(function_name);
	if (ignore_fail_dump)
		injection_list_remove("xe_is_injection_active");

	return err;
}
"""

We could alternatively keep the injection_list_append and set_retval as a part of
ignore_fail_dump_in_dmesg, and just check the return value at the end of the
inject_fault_probe function when performing the injection_list_remove.  Or we
can just keep it the way it is currently.  I'll leave that to your discretion.
-Jonathan Cavitt

>  		inject_fault_probe(fd, pci_slot, function_name);
>  		igt_kmod_unbind("xe", pci_slot);
>  	}
> -- 
> 2.43.0
> 
>