[PATCH i-g-t v5] tests/intel/xe_fault_injection: Suppress Guc CT dumps during fault injection

K V P, Satyanarayana satyanarayana.k.v.p at intel.com
Thu Jun 12 16:38:00 UTC 2025


> -----Original Message-----
> From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Sent: Thursday, June 12, 2025 7:50 PM
> To: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; igt-
> dev at lists.freedesktop.org
> Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Harrison, John C
> <john.c.harrison at intel.com>; Dugast, Francois <francois.dugast at intel.com>;
> Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Subject: RE: [PATCH i-g-t v5] tests/intel/xe_fault_injection: Suppress Guc CT
> dumps during fault injection
> 
> -----Original Message-----
> From: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> Sent: Thursday, June 12, 2025 1:15 AM
> To: igt-dev at lists.freedesktop.org
> Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Harrison, John C
> <john.c.harrison at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>;
> Dugast, Francois <francois.dugast at intel.com>
> Subject: [PATCH i-g-t v5] tests/intel/xe_fault_injection: Suppress Guc CT
> dumps during fault injection
> >
> > When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
> > functions, the CI test systems are going out of space and crashing. To
> > avoid this issue, a new helper function is created and when fault is
> > injected into this xe_should_fail_ct_dead_capture() helper function,
> > ct dead capture is avoided which suppresses ct dumps in the log.
> >
> > Inject fault into xe_should_fail_ct_dead_capture() function along with
> > xe_guc_ct_send_recv() & xe_guc_mmio_send_recv() to suppress GUC ct
> dumps.
> >
> > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > Suggested-by: John Harrison <John.C.Harrison at Intel.com>
> > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Cc: Francois Dugast <francois.dugast at intel.com>
> > ---
> > Same as https://patchwork.freedesktop.org/series/148416/ which was
> > reverted due to change from XE was still in review.
> >
> > V4 -> V5:
> > - Fixed review comments (Jonathan).
> >
> > Test-with: 20250612080402.22011-1-satyanarayana.k.v.p at intel.com
> > ---
> >  tests/intel/xe_fault_injection.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> >
> > diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> > index aa3a3a7c2..8245c558c 100644
> > --- a/tests/intel/xe_fault_injection.c
> > +++ b/tests/intel/xe_fault_injection.c
> > @@ -113,6 +113,22 @@ static void injection_list_add(const char
> function_name[])
> >  	close(dir);
> >  }
> >
> > +static void injection_list_append(const char function_name[])
> > +{
> > +	int dir, fd, ret;
> > +
> > +	dir = fail_function_open();
> > +	igt_assert_lte(0, dir);
> > +
> > +	fd = openat(dir, "inject", O_WRONLY | O_APPEND);
> > +	igt_assert_lte(0, fd);
> > +	ret = write(fd, function_name, strlen(function_name));
> > +	igt_assert_lte(0, ret);
> > +
> > +	close(fd);
> > +	close(dir);
> > +}
> > +
> >  static void injection_list_remove(const char function_name[])
> >  {
> >  	int dir;
> > @@ -192,6 +208,18 @@ static void set_retval(const char function_name[],
> long long retval)
> >  	close(dir);
> >  }
> >
> > +static void ignore_fail_dump_in_dmesg(const char function_name[], bool
> enable)
> > +{
> > +	if (strstr(function_name, "send_recv")) {
> > +		if (enable) {
> > +			injection_list_append("xe_is_injection_active");
> > +			set_retval("xe_is_injection_active", INJECT_ERRNO);
> > +		} else {
> > +			injection_list_remove("xe_is_injection_active");
> > +		}
> > +	}
> > +}
> > +
> >  /**
> >   * SUBTEST: inject-fault-probe-function-%s
> >   * Description: inject an error in the injectable function %arg[1] then
> > @@ -227,11 +255,13 @@ inject_fault_probe(int fd, const char pci_slot[],
> const char function_name[])
> >  	ignore_dmesg_errors_from_dut(pci_slot);
> >  	injection_list_add(function_name);
> >  	set_retval(function_name, INJECT_ERRNO);
> > +	ignore_fail_dump_in_dmesg(function_name, true);
> >
> >  	igt_kmod_bind("xe", pci_slot);
> >
> >  	err = -errno;
> >  	injection_list_remove(function_name);
> > +	ignore_fail_dump_in_dmesg(function_name, false);
> 
> This is better, though I think there's a stigma against giving a single function a
> boolean
> mode switch like this.  I don't know where that stigma came from, but it might
> be
> preferrable for you to just break the "false" case out and run it here directly.
> 
> Perhaps something like:
> 
> """
>  static bool ignore_fail_dump_in_dmesg(const char function_name[])
> {
> 	bool ret = !!strstr(function_name, "send_recv");
> 
> 	if (ret) {
> 		injection_list_append("xe_is_injection_active");
> 		set_retval("xe_is_injection_active", INJECT_ERRNO);
> 	}
> 	return ret;
> }
> ...
> 	ignore_dmesg_errors_from_dut(pci_slot);
> 	injection_list_add(function_name);
> 	set_retval(function_name, INJECT_ERRNO);
> 	ignore_dump = ignore_fail_dump_in_dmesg(function_name);
> 
> 	igt_kmod_bind("xe", pci_slot);
> 
> 	err = -errno;
>  	injection_list_remove(function_name);
> 	if (ignore_dump)
> 		injection_list_remove("xe_is_injection_active");
> """
> 
> If you do that, you can have my
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> -Jonathan Cavitt
> 
I do not think, I am going to implement above method as it increases LOC count 
and not seeing any benefit with implementing this.
Let me know if you see any benefit with suggested approach.
-Satya.
> >  	return err;
> >  }
> > --
> > 2.43.0
> >
> >


More information about the igt-dev mailing list