[Intel-gfx] [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
Chris Wilson
chris at chris-wilson.co.uk
Fri Jul 6 06:37:53 UTC 2018
Quoting Rodrigo Vivi (2018-07-05 21:44:56)
> On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote:
> > If the GPU is irrecoverably wedged on startup, it means that it failed
> > on initialisation and we have already tried to reset it but failed. We
> > can ignore all further testing, as it is already dead. Failing early,
> > prevents us from slowly failing in our endeavours later and timing out.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index fe7d3190ebfe..fca073c96c2d 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
> > if (!intel_has_gpu_reset(i915))
> > return 0;
> >
> > + if (i915_terminally_wedged(&i915->gpu_error))
> > + return -EIO; /* we're long past hope of a successful reset */
> > +
>
> Maybe -ENOTRECOVERABLE ?
Interesting choice, our convention so far has been -EIO for losing state
due to a GPU hang, but an extra flavour for when we wedge the driver?
Hmm, fence->error needs to remain -EIO (differentiating that between
reset/wedge for userspace seems to convey no more information imo), and
we've already baked
if (i915_terminally_wedged(&i915->gpu_error))
return -EIO;
into the abi for the points of interest.
Sadly too late, I don't think we can pick another errno for the cases it
actually matter.
-Chris
More information about the Intel-gfx
mailing list