[PATCH i-g-t v5 2/3] tests/intel/xe_fault_injection: Inject errors in xe_guc_* calls

K V P, Satyanarayana satyanarayana.k.v.p at intel.com
Tue Apr 15 08:10:27 UTC 2025


Hi
> -----Original Message-----
> From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Sent: Monday, April 14, 2025 8:47 PM
> To: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; igt-
> dev at lists.freedesktop.org
> Cc: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; Dugast, Francois
> <francois.dugast at intel.com>; Laguna, Lukasz <lukasz.laguna at intel.com>; De
> Marchi, Lucas <lucas.demarchi at intel.com>; Cavitt, Jonathan
> <jonathan.cavitt at intel.com>
> Subject: RE: [PATCH i-g-t v5 2/3] tests/intel/xe_fault_injection: Inject errors in
> xe_guc_* calls
> 
> -----Original Message-----
> From: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>
> Sent: Monday, April 14, 2025 7:59 AM
> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; igt-
> dev at lists.freedesktop.org
> Cc: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; Dugast, Francois
> <francois.dugast at intel.com>; Laguna, Lukasz <lukasz.laguna at intel.com>; De
> Marchi, Lucas <lucas.demarchi at intel.com>; K V P, Satyanarayana
> <satyanarayana.k.v.p at intel.com>
> Subject: RE: [PATCH i-g-t v5 2/3] tests/intel/xe_fault_injection: Inject errors in
> xe_guc_* calls
> >
> > Hi
> > > -----Original Message-----
> > > From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> > > Sent: Monday, April 14, 2025 8:19 PM
> > > To: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; igt-
> > > dev at lists.freedesktop.org
> > > Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Wajdeczko,
> > > Michal <Michal.Wajdeczko at intel.com>; Dugast, Francois
> > > <francois.dugast at intel.com>; Laguna, Lukasz <lukasz.laguna at intel.com>;
> De
> > > Marchi, Lucas <lucas.demarchi at intel.com>; Cavitt, Jonathan
> > > <jonathan.cavitt at intel.com>
> > > Subject: RE: [PATCH i-g-t v5 2/3] tests/intel/xe_fault_injection: Inject errors
> in
> > > xe_guc_* calls
> > >
> > > -----Original Message-----
> > > From: igt-dev <igt-dev-bounces at lists.freedesktop.org> On Behalf Of
> > > Satyanarayana K V P
> > > Sent: Monday, April 14, 2025 7:39 AM
> > > To: igt-dev at lists.freedesktop.org
> > > Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Wajdeczko,
> > > Michal <Michal.Wajdeczko at intel.com>; Dugast, Francois
> > > <francois.dugast at intel.com>; Laguna, Lukasz <lukasz.laguna at intel.com>;
> De
> > > Marchi, Lucas <lucas.demarchi at intel.com>
> > > Subject: [PATCH i-g-t v5 2/3] tests/intel/xe_fault_injection: Inject errors in
> > > xe_guc_* calls
> > > >
> > > > Use the kernel fault injection infrastructure to test error handling
> > > > of xe during driver probe when executing xe_guc_ct_send_recv() /
> > > > xe_guc_mmio_send_recv() so that more code paths are tested, such as
> > > > error handling and unwinding.
> > > >
> > > > All xe_init() kind of functions are called just once during driver probe,
> > > > so it is sufficient to fail first/all calls to them. Driver communicates
> > > > with the GuC multiple times, and the real failure can happen at different
> > > > call, hence the need to inject failure in GuC communication functions,
> > > > like guc_mmio_send() or guc_ct_send(), but it can't be just first call or
> > > > all calls, but we need to be able to select specific iteration to fail.
> > > >
> > > > To address this problem, an optional input argument is introduced. If the
> > > > argument is not set, an error will be injected in every possible function
> > > > call starting from first up to the max number of iteration defined by
> > > > INJECT_ITERATIONS, currently hardcoded as 100. If the input argument is
> > > > set, an error can be injected at specific function call.
> > > >
> > > > Error can be injected using:
> > > > igt at xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv
> > > > igt at xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv
> > > >
> > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > > > Cc: Michał Wajdeczko <michal.wajdeczko at intel.com>
> > > > Cc: Francois Dugast <francois.dugast at intel.com>
> > > > Cc: Lukasz Laguna <lukasz.laguna at intel.com>
> > > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > ---
> > > >  tests/intel/xe_fault_injection.c | 75
> > > +++++++++++++++++++++++++++++++-
> > > >  1 file changed, 74 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tests/intel/xe_fault_injection.c
> b/tests/intel/xe_fault_injection.c
> > > > index 44c42e83a..adcab2e4d 100644
> > > > --- a/tests/intel/xe_fault_injection.c
> > > > +++ b/tests/intel/xe_fault_injection.c
> > > > @@ -26,7 +26,9 @@
> > > >  #define INJECT_ERRNO	-ENOMEM
> > > >  #define BO_ADDR		0x1a0000
> > > >  #define BO_SIZE		(1024*1024)
> > > > +#define INJECT_ITERATIONS	100
> > > >
> > > > +int32_t inject_iters_raw;
> > > >  struct fault_injection_params {
> > > >  	/* @probability: Likelihood of failure injection, in percent. */
> > > >  	uint32_t probability;
> > > > @@ -233,6 +235,38 @@ inject_fault_probe(int fd, char pci_slot[], const
> > > char function_name[])
> > > >  	injection_list_remove(function_name);
> > > >  }
> > > >
> > > > +/**
> > > > + * SUBTEST: probe-fail-guc-%s
> > > > + * Description: inject an error in the injectable function %arg[1] then
> > > reprobe driver
> > > > + * Functionality: fault
> > > > + *
> > > > + * arg[1]:
> > > > + * @xe_guc_mmio_send_recv:     Inject an error when calling
> > > xe_guc_mmio_send_recv
> > > > + * @xe_guc_ct_send_recv:       Inject an error when calling
> > > xe_guc_ct_send_recv
> > > > + */
> > > > +static void probe_fail_guc(int fd, char pci_slot[], const char
> > > function_name[],
> > > > +               struct fault_injection_params *fault_params)
> > > > +{
> > > > +	int iter_start = 0, iter_end = 0, iter = 0;
> > > > +
> > > > +	igt_assert(fault_params);
> > > > +
> > > > +	/* inject_iters_raw will have zero if unset / set to <=0 or malformed.
> > > > +	   When set to > 0 it will have iteration number and will run single n-th
> > > > +	   iteration only.
> > > > +	*/
> > > > +	iter = inject_iters_raw;
> > > > +	iter_start = iter ? : 0;
> > > > +	iter_end = iter ? iter + 1 : INJECT_ITERATIONS;
> > > > +	igt_debug("Injecting error for %d - %d iterations\n", iter_start,
> > > iter_end);
> > > > +	for (int i = iter_start; i < iter_end; i++) {
> > > > +		fault_params->space = i;
> > >
> > > Is "space" the correct parameter to set here?  From a cursory glance, I'd
> expect
> > > you'd want to set the "interval" parameter instead.
> > >
> > From the Linux kernel documentation,
> > Interval --> specifies the interval between failures
> > Space --> specifies an initial resource "budget", decremented by "size" on
> each call to should_fail(,size). Failure injection is suppressed until "space"
> reaches zero.
> > We want to inject error at specific call and not with an interval. So, chosen
> space here.
> 
> Okay.  With that full explanation, the use of space here makes more sense.
> However, I think the full explanation should
> be added to the fault_injection_params struct in patch 1:
> """
> 	/*
> 	 * @space: Specifies an initial resource "budget", decremented by
> "size" on each call
> 	 * to should_fail(,size).  Failure injection is suppressed until "space"
> reaches zero.
> 	 */
> 	uint32_t space;
> """
> 
> Or perhaps just:
> """
> 	/* @space: Specifies how many times fault injection is suppressed
> before first injection */
> 	uint32_t space;
> """
> 
Updated and sent a new series.
> > > > +		setup_injection_fault(fault_params);
> > > > +		inject_fault_probe(fd, pci_slot, function_name);
> > > > +		igt_kmod_unbind("xe", pci_slot);
> > > > +	}
> > > > +}
> > > > +
> > > >  /**
> > > >   * SUBTEST: exec-queue-create-fail-%s
> > > >   * Description: inject an error in function %arg[1] used in exec queue
> create
> > > IOCTL to make it fail
> > > > @@ -407,10 +441,36 @@ oa_add_config_fail(int fd, int sysfs, int devid,
> > > const char function_name[])
> > > >  	igt_assert_eq(intel_xe_perf_ioctl(fd,
> > > DRM_XE_OBSERVATION_OP_REMOVE_CONFIG, &config_id), 0);
> > > >  }
> > > >
> > > > -igt_main
> > > > +static int opt_handler(int opt, int opt_index, void *data)
> > > > +{
> > > > +	int in_param;
> > > > +	switch (opt) {
> > > > +	case 'I':
> > > > +		/* Update to 0 if not exported / -ve value */
> > > > +		in_param = atoi(optarg);
> > >
> > > I'm assuming that the below igt_main_args is a wrapper that calls getopt,
> > > because otherwise
> > > I don't think optarg will be set here.
> > The opt_handler() is called as part of igt_main_args().
> 
> I figured this was the case, but wanted to make sure.
> -Jonathan Cavitt
> 
> > >
> > > > +		if (!in_param || in_param <= 0 || in_param >
> > > INJECT_ITERATIONS)
> > > > +			inject_iters_raw = 0;
> > > > +		else
> > > > +			inject_iters_raw = in_param;
> > > > +		break;
> > > > +	default:
> > > > +		return IGT_OPT_HANDLER_ERROR;
> > > > +	}
> > > > +
> > > > +	return IGT_OPT_HANDLER_SUCCESS;
> > > > +}
> > > > +
> > > > +const char *help_str =
> > > > +	"  -I\tIf set, an error will be injected at specific function call.\n\
> > > > +	If not set, an error will be injected in every possible function call\
> > > > +	starting from first up to 100."
> > > > +	;
> > >
> > > NIT:
> > > Semicolon after string should be placed in line with the quotation mark, not
> on
> > > a newline:
> > Some tests are using this approach (eg: xe_create.c). So, followed the same.
> Will update in next version.
> > > """
> > > const char *help_str =
> > > 	"  -I\tIf set, an error will be injected at specific function call.\n\
> > > 	If not set, an error will be injected in every possible function call\
> > > 	starting from first up to 100.";
> > > """
Updated and sent a new series.
> > > -Jonathan Cavitt
> > >
> > > > +
> > > > +igt_main_args("I:", NULL, help_str, opt_handler, NULL)
> > > >  {
> > > >  	int fd, sysfs;
> > > >  	struct drm_xe_engine_class_instance *hwe;
> > > > +	struct fault_injection_params fault_params;
> > > >  	static uint32_t devid;
> > > >  	char pci_slot[NAME_MAX];
> > > >  	const struct section {
> > > > @@ -469,6 +529,12 @@ igt_main
> > > >  		{ }
> > > >  	};
> > > >
> > > > +	const struct section guc_fail_functions[] = {
> > > > +		{ "xe_guc_mmio_send_recv" },
> > > > +		{ "xe_guc_ct_send_recv" },
> > > > +		{ }
> > > > +	};
> > > > +
> > > >  	igt_fixture {
> > > >  		igt_require(fail_function_injection_enabled());
> > > >  		fd = drm_open_driver(DRIVER_XE);
> > > > @@ -511,6 +577,13 @@ igt_main
> > > >  		igt_subtest_f("inject-fault-probe-function-%s", s->name)
> > > >  			inject_fault_probe(fd, pci_slot, s->name);
> > > >
> > > > +	for (const struct section *s = guc_fail_functions; s->name; s++)
> > > > +		igt_subtest_f("probe-fail-guc-%s", s->name) {
> > > > +			memcpy(&fault_params, &default_fault_params,
> > > > +					sizeof(struct fault_injection_params));
> > > > +			probe_fail_guc(fd, pci_slot, s->name, &fault_params);
> > > > +		}
> > > > +
> > > >  	igt_fixture {
> > > >  		close(sysfs);
> > > >  		drm_close_driver(fd);
> > > > --
> > > > 2.43.0
> > > >
> > > >
> >


More information about the igt-dev mailing list