[PATCH v2 3/3] RFC drm/xe: add fault injection for lmem init check

Rodrigo Vivi rodrigo.vivi at intel.com
Tue Mar 19 14:38:06 UTC 2024


On Tue, Mar 19, 2024 at 10:16:47AM +0530, Riana Tauro wrote:
> Hi Rodrigo
> 
> On 3/19/2024 2:45 AM, Rodrigo Vivi wrote:
> > On Fri, Mar 15, 2024 at 03:35:30PM +0530, Riana Tauro wrote:
> > > add a boot time fault injection for lmem init check.
> > > This can be triggered by adding a modparam fail_lmem_init
> > > 
> > > xe.fail_lmem_init=<interval>,<probability>,<space>,<times>
> > 
> > Please let's avoid module parameters as much as we can.
> > 
> > Let's use the CONFIG_FAULT_INJECTION_DEBUG_FS
> > similarly to
> > 
> > fault_create_debugfs_attr("fail_gt_reset", root, &gt_reset_f\
> > ailure);
> > 
> lmem init check is done during early probe. We cannot set debugfs before
> probe completes. So i added the module parameter.

doh! indeed! sorry about that.

> 
> I can try to set static values before injecting fault if module param is not
> needed.
> 
> lmem_init_fail.times = 1;
> lmem_init_fail.probability = 100;

no, let's go with the module parameter. It would be good if we could have
something per-device, but there's no way to pass argument to the bind/probe
operation...

hmm, unless if we also require the pci id as the input to the param.
The bad part would be that we need to parse the str, then make another
string for the setup_fault_attr().

also I agree with Himal, an igt case is important here.

Thanks,
Rodrigo.

> 
> Thanks
> Riana
> > And then use it like this:
> > 
> > https://lore.kernel.org/all/20240315010843.194335-1-rodrigo.vivi@intel.com/
> > 
> > > 
> > > Adding this causes the lmem init check to fail causing
> > > the probe to defer.
> > > 
> > > v2: add fault injection (Lucas)
> > > 
> > > Signed-off-by: Riana Tauro <riana.tauro at intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_device.c | 21 +++++++++++++++++++++
> > >   drivers/gpu/drm/xe/xe_module.c |  5 +++++
> > >   drivers/gpu/drm/xe/xe_module.h |  3 +++
> > >   3 files changed, 29 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > index 50473329cce7..393610e95bd1 100644
> > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > @@ -51,6 +51,10 @@ struct lockdep_map xe_device_mem_access_lockdep_map = {
> > >   };
> > >   #endif
> > > +#ifdef CONFIG_FAULT_INJECTION
> > > +DECLARE_FAULT_ATTR(lmem_init_fail);
> > > +#endif
> > > +
> > >   static int xe_file_open(struct drm_device *dev, struct drm_file *file)
> > >   {
> > >   	struct xe_device *xe = to_xe_device(dev);
> > > @@ -431,6 +435,23 @@ static int wait_for_lmem_ready(struct xe_device *xe)
> > >   	if (IS_SRIOV_VF(xe))
> > >   		return 0;
> > > +#ifdef CONFIG_FAULT_INJECTION
> > > +	/*
> > > +	 * use fault injection to cause a lmem init failure to validate
> > > +	 * deferred probe. Set the verbose to 0 to  avoid dump stack
> > > +	 */
> > > +	if (xe_modparam.fail_lmem_init) {
> > > +		setup_fault_attr(&lmem_init_fail, xe_modparam.fail_lmem_init);
> > > +		lmem_init_fail.verbose = 0;
> > > +		if (should_fail(&lmem_init_fail, 1)) {
> > > +			/* add delay to reduce the number of deferred probe attempts */
> > > +			msleep(500);
> > > +			drm_dbg(&xe->drm, "Fault Injection lmem init failure\n");
> > > +			return -EPROBE_DEFER;
> > > +		}
> > > +	}
> > > +#endif
> > > +
> > >   	if (verify_lmem_ready(gt))
> > >   		return 0;
> > > diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> > > index 110b69864656..c4efbab430a7 100644
> > > --- a/drivers/gpu/drm/xe/xe_module.c
> > > +++ b/drivers/gpu/drm/xe/xe_module.c
> > > @@ -48,6 +48,11 @@ module_param_named_unsafe(force_probe, xe_modparam.force_probe, charp, 0400);
> > >   MODULE_PARM_DESC(force_probe,
> > >   		 "Force probe options for specified devices. See CONFIG_DRM_XE_FORCE_PROBE for details.");
> > > +#ifdef CONFIG_FAULT_INJECTION
> > > +module_param_named_unsafe(fail_lmem_init, xe_modparam.fail_lmem_init, charp, 0400);
> > > +MODULE_PARM_DESC(fail_lmem_init, "Fault injection. fail_lmem_init=<interval>,<probability>,<space>,<times>");
> > > +#endif
> > > +
> > >   struct init_funcs {
> > >   	int (*init)(void);
> > >   	void (*exit)(void);
> > > diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> > > index 88ef0e8b2bfd..ccbeacbc3efb 100644
> > > --- a/drivers/gpu/drm/xe/xe_module.h
> > > +++ b/drivers/gpu/drm/xe/xe_module.h
> > > @@ -18,6 +18,9 @@ struct xe_modparam {
> > >   	char *huc_firmware_path;
> > >   	char *gsc_firmware_path;
> > >   	char *force_probe;
> > > +#if IS_ENABLED(CONFIG_FAULT_INJECTION)
> > > +	char *fail_lmem_init;
> > > +#endif /* CONFIG_FAULT_INJECTION */
> > >   };
> > >   extern struct xe_modparam xe_modparam;
> > > -- 
> > > 2.40.0
> > > 


More information about the Intel-xe mailing list