[Intel-gfx] [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged

Fri Jul 6 06:37:53 UTC 2018

Quoting Rodrigo Vivi (2018-07-05 21:44:56)
> On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote:
> > If the GPU is irrecoverably wedged on startup, it means that it failed
> > on initialisation and we have already tried to reset it but failed. We
> > can ignore all further testing, as it is already dead. Failing early,
> > prevents us from slowly failing in our endeavours later and timing out.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index fe7d3190ebfe..fca073c96c2d 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
> >       if (!intel_has_gpu_reset(i915))
> >               return 0;
> >  
> > +     if (i915_terminally_wedged(&i915->gpu_error))
> > +             return -EIO; /* we're long past hope of a successful reset */
> > +
> 
> Maybe -ENOTRECOVERABLE ?

Interesting choice, our convention so far has been -EIO for losing state
due to a GPU hang, but an extra flavour for when we wedge the driver?

Hmm, fence->error needs to remain -EIO (differentiating that between
reset/wedge for userspace seems to convey no more information imo), and
we've already baked 
	if (i915_terminally_wedged(&i915->gpu_error))
		return -EIO;
into the abi for the points of interest. 

Sadly too late, I don't think we can pick another errno for the cases it
actually matter.
-Chris