[Intel-gfx] [PATCH] drm/i915: Report an error when i915.reset prevents a reset

Mon Jun 22 07:53:28 PDT 2015

On Mon, Jun 22, 2015 at 04:48:26PM +0200, Daniel Vetter wrote:
> On Mon, Jun 22, 2015 at 02:53:39PM +0100, Chris Wilson wrote:
> > On Mon, Jun 22, 2015 at 03:44:51PM +0200, Daniel Vetter wrote:
> > > On Thu, Jun 18, 2015 at 11:42:08AM +0100, Chris Wilson wrote:
> > > > If the user disables the GPU reset using the i915.reset parameter and
> > > > one occurs, report that we failed to reset the GPU. If we return early,
> > > > as we currently do, then we leave all state intact (with a hung GPU)
> > > > and clients block forever waiting for their requests to complete.
> > > > 
> > > > Testcase: igt/gem_eio
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_dma.c     | 1 -
> > > >  drivers/gpu/drm/i915/i915_drv.c     | 3 ---
> > > >  drivers/gpu/drm/i915/intel_uncore.c | 3 +++
> > > >  3 files changed, 3 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > > > index 88795d2f1819..c5349fa3fcce 100644
> > > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > > @@ -165,7 +165,6 @@ static int i915_getparam(struct drm_device *dev, void *data,
> > > >  		break;
> > > >  	case I915_PARAM_HAS_GPU_RESET:
> > > >  		value = i915.enable_hangcheck &&
> > > > -			i915.reset &&
> > > >  			intel_has_gpu_reset(dev);
> > > >  		break;
> > > >  	default:
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > > index 78ef0bb53c36..25ffe8afe744 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > > @@ -863,9 +863,6 @@ int i915_reset(struct drm_device *dev)
> > > >  	bool simulated;
> > > >  	int ret;
> > > >  
> > > > -	if (!i915.reset)
> > > > -		return 0;
> > > > -
> > > >  	intel_reset_gt_powersave(dev);
> > > >  
> > > >  	mutex_lock(&dev->struct_mutex);
> > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > > index 4a86cf007aa0..f8e75def1a1d 100644
> > > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > > @@ -1457,6 +1457,9 @@ static int gen6_do_reset(struct drm_device *dev)
> > > >  
> > > >  static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
> > > >  {
> > > > +	if (!i915.reset)
> > > > +		return NULL;
> > > 
> > > Maybe a special reset function which always returns -EIO and prints
> > > something to illuminate the situation to dmesg? Otherwise even more wtf
> > > material in bugzilla ...
> > 
> > It does report a failure (different to reset() itself failing), and reset
> > should be yet another do not touch module option.
> 
> The only one I could spot is conditional upon the hang being a simulated
> one. But just marking i915.reset as unsafe works for me too, I'll amend
> your patch.

	if (ret) {
		DRM_ERROR("Failed to reset chip: %i\n", ret);
		goto error;
	}

ENODEV for anything > gen2 is a giveaway.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre