[Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+

Thu Jun 18 07:58:06 PDT 2015

On Thu, Jun 18, 2015 at 12:42:55PM +0100, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 12:18:39PM +0100, Tomas Elf wrote:
> > My point was more along the lines of bailing out if the reset
> > request fails and not return an error message but simply keep track
> > of the number of times we've attempted the reset request. By not
> > returning an error we would allow more subsequent hang detections to
> > happen (since the hang is still there), which would end up in the
> > same reset request in the future. If the reset request would fail
> > more times we would simply increment the counter and at one point we
> > would decide that we've had too many unsuccessful reset request
> > attempts and simply go ahead with the reset anyway and if the reset
> > would fail we would return an error at that point in time, which
> > would result in a terminally wedged state. But, yeah, I can see why
> > we shouldn't do this.
> 
> Skipping to the middle!
> 
> I understand the merit in trying the reset a few times before giving up,
> it would just need a bit of restructuring to try the reset before
> clearing gem state (trivial) and requeueing the hangcheck. I am just
> wary of feature creep before we get stuck into TDR, which promises to
> change how we think about resets entirely.

My maintainer concern here is always that we should err on the side of not
killing the machine. If the reset failed, or if the gpu reinit failed then
marking the gpu as wedged has historically been the safe option. The
system will still run, display mostly works and there's a reasonable
chance you can gather debug data.

We do have i915.reset to disable the reset for these cases, but it's
always a nuisance to have to resort to that.
-Daneil
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch