[PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT dumps during fault injection

K V P, Satyanarayana satyanarayana.k.v.p at intel.com
Thu Jun 12 07:52:02 UTC 2025


Hi.
> -----Original Message-----
> From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Sent: Tuesday, June 10, 2025 9:18 PM
> To: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; igt-
> dev at lists.freedesktop.org
> Cc: Harrison, John C <john.c.harrison at intel.com>; Dugast, Francois
> <francois.dugast at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Subject: RE: [PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT
> dumps during fault injection
> 
> -----Original Message-----
> From: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>
> Sent: Tuesday, June 10, 2025 8:22 AM
> To: igt-dev at lists.freedesktop.org
> Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Harrison, John C
> <john.c.harrison at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>;
> Dugast, Francois <francois.dugast at intel.com>
> Subject: [PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT dumps
> during fault injection
> >
> > When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
> > functions, the CI test systems are going out of space and crashing. To
> > avoid this issue, a new helper function is created and when fault is injected
> > into this xe_should_fail_ct_dead_capture() helper function, ct dead capture
> is
> > avoided which suppresses ct dumps in the log.
> >
> > Inject fault into xe_should_fail_ct_dead_capture() function along with
> > xe_guc_ct_send_recv() & xe_guc_mmio_send_recv() to suppress GUC ct
> dumps.
> >
> > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > Suggested-by: John Harrison <John.C.Harrison at Intel.com>
> > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Cc: Francois Dugast <francois.dugast at intel.com>
> > ---
> > Same as https://patchwork.freedesktop.org/series/148416/ which was
> > reverted due to change from XE was still in review.
> >
> > Test-with: 20250524144613.11970-1-satyanarayana.k.v.p at intel.com
> > ---
> >  tests/intel/xe_fault_injection.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> > index 9fe6bfe35..bbc8e71d6 100644
> > --- a/tests/intel/xe_fault_injection.c
> > +++ b/tests/intel/xe_fault_injection.c
> > @@ -123,6 +123,22 @@ static void injection_list_add(const char
> function_name[])
> >  	close(dir);
> >  }
> >
> > +static void injection_list_append(const char function_name[])
> > +{
> > +	int dir, fd, ret;
> > +
> > +	dir = fail_function_open();
> > +	igt_assert_lte(0, dir);
> > +
> > +	fd = openat(dir, "inject", O_WRONLY | O_APPEND);
> > +	ret = write(fd, function_name, strlen(function_name));
> 
> On one hand, we should probably guard against openat failing by
> implementing
> some error catching here, I.E.:
> 
> """
> 	fd = openat(dir, "inject", O_WRONLY | O_APPEND);
> 	igt_assert_lte(0, fd);
> 	ret = write(fd, function_name, strlen(function_name));
> """
> 
> On the other hand, if we fail to open the file with openat and attempt to pass
> the broken fd to write, then the write function would end up failing instead,
> so we'd end up catching the error later in this function either way.  So, I
> suppose it's not strictly necessary to implement a guard here, though I think
> it would be preferred for stylistic reasons.
>
Fixed in new version. 
> > +
> > +	close(fd);
> > +	close(dir);
> > +
> > +	igt_assert_lte(0, ret);
> > +}
> > +
> >  static void injection_list_remove(const char function_name[])
> >  {
> >  	int dir;
> > @@ -202,6 +218,15 @@ static void set_retval(const char function_name[],
> long long retval)
> >  	close(dir);
> >  }
> >
> > +static void ignore_fail_dump_in_dmesg(const char function_name[])
> > +{
> > +	if (!strcmp(function_name, "xe_guc_ct_send_recv") ||
> > +	    !strcmp(function_name, "xe_guc_mmio_send_recv")) {
> 
> I think this if statement can be shortened.  Try:
> 
> """
> 	if (!strstr(function_name, "send_recv")) {
> 		injection_list_append("xe_is_injection_active");
> 		set_retval("xe_is_injection_active", INJECT_ERRNO);
> 	}
> """
> 
Fixed in new version.
> > +		injection_list_append("xe_is_injection_active");
> > +		set_retval("xe_is_injection_active", INJECT_ERRNO);
> > +	}
> > +}
> > +
> >  /**
> >   * SUBTEST: inject-fault-probe-function-%s
> >   * Description: inject an error in the injectable function %arg[1] then
> > @@ -236,6 +261,7 @@ inject_fault_probe(int fd, char pci_slot[], const char
> function_name[])
> >
> >  	ignore_faults_in_dmesg(function_name);
> >  	injection_list_add(function_name);
> > +	ignore_fail_dump_in_dmesg(function_name);
> >  	set_retval(function_name, INJECT_ERRNO);
> >
> >  	igt_kmod_bind("xe", pci_slot);
> > @@ -273,6 +299,8 @@ static void probe_fail_guc(int fd, char pci_slot[],
> const char function_name[],
> >  	for (int i = iter_start; i < iter_end; i++) {
> >  		fault_params->space = i;
> >  		setup_injection_fault(fault_params);
> > +		/* Clear the injection list  */
> > +		injection_list_clear();
> 
> Hmm... I think it might be better to perform a targeted clear, instead of just
> wiping the list clean after each run.  To do this, maybe we can have
> ignore_fail_dump_in_dmesg return some value.  Perhaps:
> 
> """
> static bool ignore_fail_dump_in_dmesg(const char function_name[])
> {
> 	return !strstr(function_name, "send_recv");
> }
> """
> 
> Then, in inject_fault_probe:
> 
> """
> static int
> inject_fault_probe(int fd, const char pci_slot[], const char function_name[])
> {
> 	int err = 0;
> 	bool ignore_fail_dump =
> ignore_fail_dump_in_dmesg(function_name);
> 	igt_info("Injecting error \"%s\" (%d) in function \"%s\"\n",
> 		 strerror(-INJECT_ERRNO), INJECT_ERRNO, function_name);
> 
> 	ignore_dmesg_errors_from_dut(pci_slot);
> 	injection_list_add(function_name);
> 	if (ignore_fail_dump) {
> 		injection_list_append("xe_is_injection_active");
> 		set_retval("xe_is_injection_active", INJECT_ERRNO);
> 	}
> 
> 	set_retval(function_name, INJECT_ERRNO);
> 
> 	igt_kmod_bind("xe", pci_slot);
> 
> 	err = -errno;
> 	injection_list_remove(function_name);
> 	if (ignore_fail_dump)
> 		injection_list_remove("xe_is_injection_active");
> 
> 	return err;
> }
> """
> 
> We could alternatively keep the injection_list_append and set_retval as a part
> of
> ignore_fail_dump_in_dmesg, and just check the return value at the end of the
> inject_fault_probe function when performing the injection_list_remove.  Or
> we
> can just keep it the way it is currently.  I'll leave that to your discretion.
> -Jonathan Cavitt
Fixed in new version.
-Satya.
> 
> >  		inject_fault_probe(fd, pci_slot, function_name);
> >  		igt_kmod_unbind("xe", pci_slot);
> >  	}
> > --
> > 2.43.0
> >
> >


More information about the igt-dev mailing list