[Intel-gfx] [PATCH v3] drm/i915: Add fault injection support

Imre Deak imre.deak at intel.com
Tue Mar 15 14:01:14 UTC 2016


On ti, 2016-03-15 at 09:56 +0100, Daniel Vetter wrote:
> On Mon, Mar 14, 2016 at 04:59:20PM +0200, Imre Deak wrote:
> > Add support for forcing an error at selected places in the driver.
> > As an
> > example add 4 options to fail during driver loading.
> > 
> > Requested by Chris.
> > 
> > v2:
> > - Add fault point for modeset initialization
> > - Print debug message when injecting an error
> > v3:
> > - Rename inject_fault to inject_load_failure, rename the related
> > macros
> >   and helper accordingly (Chris)
> > 
> > CC: Chris Wilson <chris at chris-wilson.co.uk>
> > Signed-off-by: Imre Deak <imre.deak at intel.com>
> > ---
> > 
> > [This depends on
> >  https://lists.freedesktop.org/archives/intel-gfx/2016-March/089597
> > .html]
> > 
> >  drivers/gpu/drm/i915/i915_dma.c    | 12 ++++++++++++
> >  drivers/gpu/drm/i915/i915_drv.h    | 16 ++++++++++++++++
> >  drivers/gpu/drm/i915/i915_params.c |  5 +++++
> >  drivers/gpu/drm/i915/i915_params.h |  1 +
> >  4 files changed, 34 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > b/drivers/gpu/drm/i915/i915_dma.c
> > index a5121cd..7490307 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -370,6 +370,9 @@ static int i915_load_modeset_init(struct
> > drm_device *dev)
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	int ret;
> >  
> > +	if (i915_inject_load_failure(I915_FAIL_INIT_MODESET))
> > +		return -ENODEV;
> > +
> >  	ret = intel_bios_init(dev_priv);
> >  	if (ret)
> >  		DRM_INFO("failed to find VBIOS tables\n");
> > @@ -950,6 +953,9 @@ static int i915_driver_init_early(struct
> > drm_i915_private *dev_priv,
> >  	struct intel_device_info *device_info;
> >  	int ret = 0;
> >  
> > +	if (i915_inject_load_failure(I915_FAIL_INIT_EARLY))
> > +		return -ENODEV;
> > +
> >  	dev_priv->dev = dev;
> >  
> >  	/* Setup the write-once "constant" device info */
> > @@ -1064,6 +1070,9 @@ static int i915_driver_init_mmio(struct
> > drm_i915_private *dev_priv)
> >  	struct drm_device *dev = dev_priv->dev;
> >  	int ret;
> >  
> > +	if (i915_inject_load_failure(I915_FAIL_INIT_MMIO))
> > +		return -ENODEV;
> > +
> >  	if (i915_get_bridge_dev(dev))
> >  		return -EIO;
> 
> Ok, thought a bit more about this, and if we really want to check all the
> load failure paths then this will become extremely verbose. Since we'd
> need to have a bitflag for every return -EFAIL.

I'm not sure if you want to check all failure paths, I think for that
the existing failslab etc. mechanisms are better suited. This new
option would be used at relatively few well defined places. The option
is a mask since Chris wanted the possibility to mix failures (which
makes sense when injecting a non-fatal failure somewhere). If he
doesn't insist on that possibility I can convert the mask option to a
counter based one identifying a single failure spot instead as you
suggest. Chris?

> So maybe another look at
> lib/fault-inject.c is warranted, plus then some macro magic that we could
> to wrap _all_ the checks in our load code like this:
> 
>  	if (i915_load_failure(i915_get_bridge_dev(dev)))
>  		return -EIO;

The above will not work when we want to propagate the error and I
haven't found a nice way to do that with such a helper taking the
function call as argument. I could move the check to the call site
instead of having it inside the called function, if that's what you'd
prefer:

if ((ret = i915_load_failure(...)) < 0 ||
    (ret = i915_driver_init_mmio(...)) < 0)
	return ret;

Although I'm not sure if this makes things clearer. Other suggestions are
welcome.

> i915_load_failure ofc needs to fail if it's argument is false, but it
> could also increment a counter every time it's called and then use that
> counter to inquire the fault injection framework with
> should_fail(i915_load_failures, counter). Abusing a counter for the size
> would allow us to easily restrict fault injection to a certain range of
> faults. We could also expose the final value of that counter in debugfs,
> so that the igt can tune itself.

Yes, I could do this provided Chris doesn't oppose the idea of the
counter based approach.

> Anyway, this is kinda a big plan, but the part I think we should ponder is
> how to make the fault injection less noise and intrusive. A macro to wrap
> existing checks seems like a good idea.

I can switch to using a counter instead of the mask and if there is a
good suggestion about the macro I can use that instead.

--Imre



More information about the Intel-gfx mailing list