[PATCH v5] drm/xe: Add driver load error injection
Francois Dugast
francois.dugast at intel.com
Wed Sep 11 10:40:04 UTC 2024
On Tue, Sep 10, 2024 at 04:33:21PM -0500, Lucas De Marchi wrote:
> On Tue, Sep 10, 2024 at 05:11:34PM GMT, Rodrigo Vivi wrote:
> > On Tue, Sep 10, 2024 at 05:22:41PM +0200, Francois Dugast wrote:
> > > Those new macros inject errors by overriding return codes. They must
> > > manually be called, preferably at the very beginning of the function
> > > that will fault, otherwise if not possible by turning this pattern:
> > >
> > > err = foo();
> > > if (err)
> > > return err;
> > >
> > > into:
> > >
> > > err = foo();
> > > err = xe_device_inject_driver_probe_error(xe, err);
> > > if (err)
> > > return err;
> > >
> > > When CONFIG_DRM_XE_DEBUG is not set, this has no effect.
> > >
> > > When CONFIG_DRM_XE_DEBUG is set, the error code at checkpoint X will
> > > be overridden when the module argument inject_driver_load_error is
> > > set to value X. By doing so, it is possible to test proper error
> > > handling and improve robustness for current and future code. A few
> > > injection points are added in this patch but more need to be added.
> > > One way to use this error injection at driver probe is:
> > >
> > > for i in {1..200}; do
> > > echo "Run $i"
> > > modprobe xe inject_driver_probe_error=$i;
> > > rmmod xe;
> > > done
> >
> > can we have an IGT test so we ensure that CI is tracking and we are working
> > to close the existing issues?
>
> yeah.. that would be great. I think it would make more sense to use
> bind/unbind in igt.
>
> >
> > >
> > > In the future this is expected to be replaced by the infrastructure
> > > provided by fault-inject.h
> >
> > I was taking a look at the fault-inject again. It could easily be a
> > global fault_attr with a module sysfs entry, then during the test
> > you load the module, then unbind the device, then change the fault-inject
> > probability and time and then bind it back what will reprobe, but now
> > with the fault-injected.
> >
> > The only problem with the fault-inject idea is that it would require
> > a very granular thing with multiple fault_attr, one per failure.
>
> when going with a real fault-injection, I'd actually try to cover it per
> function as described here:
>
> https://docs.kernel.org/fault-injection/fault-injection.html
> /sys/kernel/debug/fail_function/inject:
>
> Format: { ‘function-name’ | ‘!function-name’ | ‘’ }
>
> specifies the target function of error injection by name. If the
> function name leads ‘!’ prefix, given function is removed from injection
> list. If nothing specified (‘’) injection list is cleared.
>
> Integration via ALLOW_ERROR_INJECTION() is similar to the
> KUNIT_STATIC_STUB_REDIRECT() we already use.
>
> In my review I didn't bother to go with fault-inject directly because we
> will probably need to refactor the code so the failure points are in
> their own functions. Something we don't have today. Short term it's
> important to fix the current/unknown problems. Mid term we can convert
> things piece meal.
>
> Are we on the same page?
It is also my intention with this patch, get something in with minimal risk
and changes so we can soon focus on solving potential issues it highlights.
In parallel I am preparing a RFC based on fault-inject with a proposal how
we can use fail_function with a few real examples from our code that we can
take more time to discuss thoroughly.
Francois
>
> > But at least this really ensures that we are really testing all the cases
> > with more reliability.
> >
> > I just realized that this i915-style probe injection might have an issue
> > on platforms with discrete platforms. Well, the pci subsystem won't
>
> one more reason to go with the bind/unbind. Then you control where it's
> happening and where.
>
> Lucas De Marchi
>
> > probe in parallel, and likely it will be the same order of probe on
> > every module load, but if it doesn't the Nth point of the failure
> > won't be the same everytime, so in every load you might stop in a
> > different device and end up with not covering every single entry.
> > Unlikely I know... And I don't believe this should be a blocker
> > to move forward with something...
> >
> > (more below)
> >
> > >
> > > v2: Fix style and build errors, modparam to 0 after probe, rename to
> > > xe_device_inject_driver_probe_error, check type when compiled out,
> > > add _return macro, move some uses to the beginning of the function
> > > v3: Rebase
> > > v4: Improve commit message and comments, keep if/return rather than
> > > change the flow inside the macro (Lucas De Marchi)
> > > v5: Rebase, add comments, keep existing return points (Lucas De Marchi)
> > > Add finish wrapper, move to function beginning for all xe functions
> > > (Michal Wajdeczko) Bolt into i915 error injection (Jani Nikula)
> > >
> > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > Signed-off-by: Francois Dugast <francois.dugast at intel.com>
> > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > ---
> > > drivers/gpu/drm/xe/display/ext/i915_utils.c | 4 +-
> > > drivers/gpu/drm/xe/xe_device.c | 48 +++++++++++++++++++++
> > > drivers/gpu/drm/xe/xe_device.h | 30 +++++++++++++
> > > drivers/gpu/drm/xe/xe_device_types.h | 5 +++
> > > drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c | 5 +++
> > > drivers/gpu/drm/xe/xe_guc.c | 1 +
> > > drivers/gpu/drm/xe/xe_guc_ct.c | 1 +
> > > drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++
> > > drivers/gpu/drm/xe/xe_mmio.c | 5 +++
> > > drivers/gpu/drm/xe/xe_module.c | 17 ++++++++
> > > drivers/gpu/drm/xe/xe_module.h | 3 ++
> > > drivers/gpu/drm/xe/xe_pci.c | 5 +++
> > > drivers/gpu/drm/xe/xe_pm.c | 5 +++
> > > drivers/gpu/drm/xe/xe_sriov.c | 7 ++-
> > > drivers/gpu/drm/xe/xe_sriov_pf.c | 6 +++
> > > drivers/gpu/drm/xe/xe_tile.c | 13 ++++++
> > > drivers/gpu/drm/xe/xe_uc.c | 4 ++
> > > drivers/gpu/drm/xe/xe_wa.c | 8 +++-
> > > drivers/gpu/drm/xe/xe_wopcm.c | 7 ++-
> > > 19 files changed, 172 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/display/ext/i915_utils.c b/drivers/gpu/drm/xe/display/ext/i915_utils.c
> > > index 43b10a2cc508..11d8377a125f 100644
> > > --- a/drivers/gpu/drm/xe/display/ext/i915_utils.c
> > > +++ b/drivers/gpu/drm/xe/display/ext/i915_utils.c
> > > @@ -4,6 +4,7 @@
> > > */
> > >
> > > #include "i915_drv.h"
> > > +#include "xe_device.h"
> > >
> > > bool i915_vtd_active(struct drm_i915_private *i915)
> > > {
> > > @@ -16,11 +17,10 @@ bool i915_vtd_active(struct drm_i915_private *i915)
> > >
> > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
> > >
> > > -/* i915 specific, just put here for shutting it up */
> > > int __i915_inject_probe_error(struct drm_i915_private *i915, int err,
> > > const char *func, int line)
> > > {
> > > - return 0;
> > > + return __xe_device_inject_driver_probe_error(i915, err, 0, func, line);
> > > }
> > >
> > > #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > index 449b85035d3a..f22d94ff302e 100644
> > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > @@ -319,6 +319,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> > > err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
> > > xe->drm.anon_inode->i_mapping,
> > > xe->drm.vma_offset_manager, false, false);
> > > + err = xe_device_inject_driver_probe_error_override(xe, err);
> > > if (WARN_ON(err))
> > > goto err;
> > >
> > > @@ -477,6 +478,7 @@ static int xe_set_dma_info(struct xe_device *xe)
> > > goto mask_err;
> > >
> > > err = dma_set_coherent_mask(xe->drm.dev, DMA_BIT_MASK(mask_size));
> > > + err = xe_device_inject_driver_probe_error_override(xe, err);
> > > if (err)
> > > goto mask_err;
> > >
> > > @@ -498,6 +500,11 @@ static int wait_for_lmem_ready(struct xe_device *xe)
> > > {
> > > struct xe_gt *gt = xe_root_mmio_gt(xe);
> > > unsigned long timeout, start;
> > > + int err;
> > > +
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > >
> > > if (!IS_DGFX(xe))
> > > return 0;
> > > @@ -750,6 +757,8 @@ int xe_device_probe(struct xe_device *xe)
> > > for_each_gt(gt, xe, id)
> > > xe_gt_sanitize_freq(gt);
> > >
> > > + xe_device_inject_driver_probe_error_finish();
> > > +
> > > return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
> > >
> > > err_fini_display:
> > > @@ -1000,3 +1009,42 @@ void xe_device_declare_wedged(struct xe_device *xe)
> > > for_each_gt(gt, xe, id)
> > > xe_gt_declare_wedged(gt);
> > > }
> > > +
> > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > +/**
> > > + * __xe_device_inject_driver_probe_error - Inject an error during device probe
> > > + * @xe: xe device instance
> > > + * @err_injected: the error to inject
> > > + * @err_real: the error returned by the actual function
> > > + * @func: the name of the function where this is called from
> > > + * @line: the line where this is called from
> > > + *
> > > + * This is not meant to be called directly, only through xe_device_inject_driver_probe_error.
> > > + *
> > > + * Return: err_real if != 0, err_injected otherwise
> >
> > Not just otherwise....
> >
> > Return 0 if this is not the Nth iteration of the requested iterations from
> > modparam.inject_driver_probe_error
> >
> > Return err_injected if in the Nth iteration...
> >
> > > + */
> > > +int __xe_device_inject_driver_probe_error(struct xe_device *xe, int err_injected, int err_real,
> > > + const char *func, int line)
> > > +{
> > > + if (err_real != 0)
> > > + return err_real;
> > > +
> > > + if (xe->inject_driver_probe_error >= xe_modparam.inject_driver_probe_error)
> > > + return 0;
> > > +
> > > + if (++xe->inject_driver_probe_error < xe_modparam.inject_driver_probe_error)
> > > + return 0;
> > > +
> > > + drm_info(&xe->drm, "Injecting failure %d at checkpoint %u [%s:%d]\n",
> > > + err_injected, xe->inject_driver_probe_error, func, line);
> > > +
> > > + xe_modparam.inject_driver_probe_error = 0;
> > > + return err_injected;
> > > +}
> > > +
> > > +void __xe_device_inject_driver_probe_error_finish(void)
> > > +{
> > > + /* After probe finishes, stop checking for error injection */
> > > + xe_modparam.inject_driver_probe_error = 0;
> > > +}
> > > +#endif
> > > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > > index 894f04770454..c410e55b6b09 100644
> > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > @@ -178,4 +178,34 @@ void xe_device_declare_wedged(struct xe_device *xe);
> > > struct xe_file *xe_file_get(struct xe_file *xef);
> > > void xe_file_put(struct xe_file *xef);
> > >
> > > +#define XE_DEVICE_INJECTED_ERR -ENODEV
> > > +#define xe_device_inject_driver_probe_error(__xe) \
> > > + __xe_device_inject_driver_probe_error(__xe, XE_DEVICE_INJECTED_ERR, 0, __func__, __LINE__)
> > > +#define xe_device_inject_driver_probe_error_override(__xe, __err_real) \
> > > + __xe_device_inject_driver_probe_error(__xe, XE_DEVICE_INJECTED_ERR, __err_real, __func__, \
> > > + __LINE__)
> > > +#define xe_device_inject_driver_probe_error_finish() \
> > > + __xe_device_inject_driver_probe_error_finish()
> > > +
> > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > +
> > > +int __xe_device_inject_driver_probe_error(struct xe_device *xe,
> > > + int err_injected, int err_real,
> > > + const char *func, int line);
> > > +
> > > +void __xe_device_inject_driver_probe_error_finish(void);
> > > +
> > > +#else
> > > +
> > > +static inline int __xe_device_inject_driver_probe_error(struct xe_device *xe,
> > > + int err_injected, int err_real,
> > > + const char *func, int line)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > > +static inline void __xe_device_inject_driver_probe_error_finish(void) {};
> > > +
> > > +#endif
> > > +
> > > #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > > index ec7eb7811126..582b8b7cdee4 100644
> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > @@ -487,6 +487,11 @@ struct xe_device {
> > > int mode;
> > > } wedged;
> > >
> > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > + /** @inject_driver_probe_error: Counter used for error injection during probe */
> > > + int inject_driver_probe_error;
> > > +#endif
> > > +
> > > #ifdef TEST_VM_OPS_ERROR
> > > /**
> > > * @vm_inject_error_position: inject errors at different places in VM
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c
> > > index 0e23b7ea4f3e..b5da321bbbea 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c
> > > @@ -12,6 +12,7 @@
> > > #include "regs/xe_guc_regs.h"
> > > #include "regs/xe_regs.h"
> > >
> > > +#include "xe_device.h"
> > > #include "xe_mmio.h"
> > > #include "xe_gt_sriov_printk.h"
> > > #include "xe_gt_sriov_pf_helpers.h"
> > > @@ -275,6 +276,10 @@ int xe_gt_sriov_pf_service_init(struct xe_gt *gt)
> > > {
> > > int err;
> > >
> > > + err = xe_device_inject_driver_probe_error(gt_to_xe(gt));
> > > + if (err)
> > > + return err;
> > > +
> > > pf_init_versions(gt);
> > >
> > > err = pf_alloc_runtime_info(gt);
> > > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > > index 5599464013bd..eb764b44ced7 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > > @@ -353,6 +353,7 @@ int xe_guc_init(struct xe_guc *guc)
> > > xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_LOADABLE);
> > >
> > > ret = devm_add_action_or_reset(xe->drm.dev, guc_fini_hw, guc);
> > > + ret = xe_device_inject_driver_probe_error_override(guc_to_xe(guc), ret);
> > > if (ret)
> > > goto out;
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index 4b95f75b1546..51ffb05605bb 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -202,6 +202,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct)
> > > ct->bo = bo;
> > >
> > > err = drmm_add_action_or_reset(&xe->drm, guc_ct_fini, ct);
> > > + err = xe_device_inject_driver_probe_error_override(xe, err);
> > > if (err)
> > > return err;
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
> > > index 034b29984d5e..d27d843057e7 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_pc.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
> > > @@ -1064,6 +1064,10 @@ int xe_guc_pc_init(struct xe_guc_pc *pc)
> > > u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
> > > int err;
> > >
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > > +
> > > if (xe->info.skip_guc_pc)
> > > return 0;
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
> > > index 3fd462fda625..a4cf082d3261 100644
> > > --- a/drivers/gpu/drm/xe/xe_mmio.c
> > > +++ b/drivers/gpu/drm/xe/xe_mmio.c
> > > @@ -136,6 +136,11 @@ int xe_mmio_probe_tiles(struct xe_device *xe)
> > > {
> > > size_t tile_mmio_size = SZ_16M;
> > > size_t tile_mmio_ext_size = xe->info.tile_mmio_ext_size;
> > > + int err;
> > > +
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > >
> > > mmio_multi_tile_setup(xe, tile_mmio_size);
> > > mmio_extension_setup(xe, tile_mmio_size, tile_mmio_ext_size);
> > > diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> > > index 77ce9f9ca7a5..3de603e0438f 100644
> > > --- a/drivers/gpu/drm/xe/xe_module.c
> > > +++ b/drivers/gpu/drm/xe/xe_module.c
> > > @@ -56,6 +56,23 @@ module_param_named_unsafe(force_probe, xe_modparam.force_probe, charp, 0400);
> > > MODULE_PARM_DESC(force_probe,
> > > "Force probe options for specified devices. See CONFIG_DRM_XE_FORCE_PROBE for details.");
> > >
> > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > +/*
> > > + * The error code at checkpoint X will be overridden when the module argument
> > > + * inject_driver_load_error is set to value X. By doing so, it is possible to
> > > + * test proper error handling and improve robustness for current and future
> > > + * code. One way to test multiple error injection points:
> > > + *
> > > + * for i in {1..200}; do
> > > + * echo "Run $i"
> > > + * modprobe xe inject_driver_probe_error=$i;
> > > + * rmmod xe;
> > > + * done
> > > + */
> > > +module_param_named_unsafe(inject_driver_probe_error, xe_modparam.inject_driver_probe_error, int, 0600);
> >
> > we need to break this line... or perhaps get a smaller word for the param name?
> >
> > > +MODULE_PARM_DESC(inject_driver_probe_error, "Inject driver probe error");
> > > +#endif
> > > +
> > > #ifdef CONFIG_PCI_IOV
> > > module_param_named(max_vfs, xe_modparam.max_vfs, uint, 0400);
> > > MODULE_PARM_DESC(max_vfs,
> > > diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> > > index 161a5e6f717f..47cefaf8d79b 100644
> > > --- a/drivers/gpu/drm/xe/xe_module.h
> > > +++ b/drivers/gpu/drm/xe/xe_module.h
> > > @@ -20,6 +20,9 @@ struct xe_modparam {
> > > char *force_probe;
> > > #ifdef CONFIG_PCI_IOV
> > > unsigned int max_vfs;
> > > +#endif
> > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > + int inject_driver_probe_error;
> > > #endif
> > > int wedged_mode;
> > > };
> > > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > > index 3bce0e550a63..9bb60b300727 100644
> > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > @@ -644,8 +644,13 @@ static int xe_info_init(struct xe_device *xe,
> > > u32 graphics_gmdid_revid = 0, media_gmdid_revid = 0;
> > > struct xe_tile *tile;
> > > struct xe_gt *gt;
> > > + int err;
> > > u8 id;
> > >
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > > +
> > > /*
> > > * If this platform supports GMD_ID, we'll detect the proper IP
> > > * descriptor to use from hardware registers. desc->graphics will only
> > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > index 9c59a30d7646..a059be07a11d 100644
> > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > @@ -258,6 +258,7 @@ int xe_pm_init_early(struct xe_device *xe)
> > > return err;
> > >
> > > err = drmm_mutex_init(&xe->drm, &xe->d3cold.lock);
> > > + err = xe_device_inject_driver_probe_error_override(xe, err);
> > > if (err)
> > > return err;
> > >
> > > @@ -276,6 +277,10 @@ int xe_pm_init(struct xe_device *xe)
> > > {
> > > int err;
> > >
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > > +
> > > /* For now suspend/resume is only allowed with GuC */
> > > if (!xe_device_uc_enabled(xe))
> > > return 0;
> > > diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
> > > index 5a1d65e4f19f..c7512d8acc28 100644
> > > --- a/drivers/gpu/drm/xe/xe_sriov.c
> > > +++ b/drivers/gpu/drm/xe/xe_sriov.c
> > > @@ -102,11 +102,13 @@ static void fini_sriov(struct drm_device *drm, void *arg)
> > > */
> > > int xe_sriov_init(struct xe_device *xe)
> > > {
> > > + int err;
> > > +
> > > if (!IS_SRIOV(xe))
> > > return 0;
> > >
> > > if (IS_SRIOV_PF(xe)) {
> > > - int err = xe_sriov_pf_init_early(xe);
> > > + err = xe_sriov_pf_init_early(xe);
> > >
> > > if (err)
> > > return err;
> > > @@ -114,7 +116,8 @@ int xe_sriov_init(struct xe_device *xe)
> > >
> > > xe_assert(xe, !xe->sriov.wq);
> > > xe->sriov.wq = alloc_workqueue("xe-sriov-wq", 0, 0);
> > > - if (!xe->sriov.wq)
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (!xe->sriov.wq || err)
> > > return -ENOMEM;
> > >
> > > return drmm_add_action_or_reset(&xe->drm, fini_sriov, xe);
> > > diff --git a/drivers/gpu/drm/xe/xe_sriov_pf.c b/drivers/gpu/drm/xe/xe_sriov_pf.c
> > > index 0f721ae17b26..8d75bb6570f0 100644
> > > --- a/drivers/gpu/drm/xe/xe_sriov_pf.c
> > > +++ b/drivers/gpu/drm/xe/xe_sriov_pf.c
> > > @@ -80,8 +80,14 @@ bool xe_sriov_pf_readiness(struct xe_device *xe)
> > > */
> > > int xe_sriov_pf_init_early(struct xe_device *xe)
> > > {
> > > + int err;
> > > +
> > > xe_assert(xe, IS_SRIOV_PF(xe));
> > >
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > > +
> > > return drmm_mutex_init(&xe->drm, &xe->sriov.pf.master_lock);
> > > }
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
> > > index dda5268507d8..774668ac67b4 100644
> > > --- a/drivers/gpu/drm/xe/xe_tile.c
> > > +++ b/drivers/gpu/drm/xe/xe_tile.c
> > > @@ -114,6 +114,10 @@ int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id)
> > > {
> > > int err;
> > >
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > > +
> > > tile->xe = xe;
> > > tile->id = id;
> > >
> > > @@ -127,6 +131,15 @@ int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id)
> > >
> > > xe_pcode_init(tile);
> > >
> > > + /*
> > > + * xe_tile_alloc() and xe_gt_alloc() only fail with -ENOMEM.
> > > + * drmm_zalloc() is used so resources will be freed even if
> > > + * an error is injected.
> > > + */
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > > +
> > > return 0;
> > > }
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> > > index 0d073a9987c2..6eaef7a3c58e 100644
> > > --- a/drivers/gpu/drm/xe/xe_uc.c
> > > +++ b/drivers/gpu/drm/xe/xe_uc.c
> > > @@ -135,6 +135,10 @@ int xe_uc_init_hwconfig(struct xe_uc *uc)
> > > {
> > > int ret;
> > >
> > > + ret = xe_device_inject_driver_probe_error(uc_to_xe(uc));
> > > + if (ret)
> > > + return ret;
> > > +
> > > /* GuC submission not enabled, nothing to do */
> > > if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > > return 0;
> > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > index 28b7f95b6c2f..8baad6106968 100644
> > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > @@ -825,6 +825,11 @@ int xe_wa_init(struct xe_gt *gt)
> > > struct xe_device *xe = gt_to_xe(gt);
> > > size_t n_oob, n_lrc, n_engine, n_gt, total;
> > > unsigned long *p;
> > > + int err;
> > > +
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (err)
> > > + return err;
> > >
> > > n_gt = BITS_TO_LONGS(ARRAY_SIZE(gt_was));
> > > n_engine = BITS_TO_LONGS(ARRAY_SIZE(engine_was));
> > > @@ -833,7 +838,8 @@ int xe_wa_init(struct xe_gt *gt)
> > > total = n_gt + n_engine + n_lrc + n_oob;
> > >
> > > p = drmm_kzalloc(&xe->drm, sizeof(*p) * total, GFP_KERNEL);
> > > - if (!p)
> > > + err = xe_device_inject_driver_probe_error(xe);
> > > + if (!p || err)
> > > return -ENOMEM;
> > >
> > > gt->wa_active.gt = p;
> > > diff --git a/drivers/gpu/drm/xe/xe_wopcm.c b/drivers/gpu/drm/xe/xe_wopcm.c
> > > index d3a99157e523..70674b30c4c6 100644
> > > --- a/drivers/gpu/drm/xe/xe_wopcm.c
> > > +++ b/drivers/gpu/drm/xe/xe_wopcm.c
> > > @@ -206,6 +206,10 @@ int xe_wopcm_init(struct xe_wopcm *wopcm)
> > > bool locked;
> > > int ret = 0;
> > >
> > > + ret = xe_device_inject_driver_probe_error(xe);
> > > + if (ret)
> > > + return ret;
> > > +
> > > if (!guc_fw_size)
> > > return -EINVAL;
> > >
> > > @@ -252,8 +256,9 @@ int xe_wopcm_init(struct xe_wopcm *wopcm)
> > > guc_wopcm_base / SZ_1K, guc_wopcm_size / SZ_1K);
> > >
> > > check:
> > > + ret = xe_device_inject_driver_probe_error_override(xe, ret);
> > > if (__check_layout(xe, wopcm->size, guc_wopcm_base, guc_wopcm_size,
> > > - guc_fw_size, huc_fw_size)) {
> > > + guc_fw_size, huc_fw_size) && !ret) {
> > > wopcm->guc.base = guc_wopcm_base;
> > > wopcm->guc.size = guc_wopcm_size;
> > > XE_WARN_ON(!wopcm->guc.base);
> > > --
> > > 2.43.0
> > >
More information about the Intel-xe
mailing list