[Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+

Daniel Vetter daniel at ffwll.ch
Mon Jun 22 05:50:05 PDT 2015


On Fri, Jun 19, 2015 at 05:30:45PM +0100, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 04:58:06PM +0200, Daniel Vetter wrote:
> > On Thu, Jun 18, 2015 at 12:42:55PM +0100, Chris Wilson wrote:
> > > I understand the merit in trying the reset a few times before giving up,
> > > it would just need a bit of restructuring to try the reset before
> > > clearing gem state (trivial) and requeueing the hangcheck. I am just
> > > wary of feature creep before we get stuck into TDR, which promises to
> > > change how we think about resets entirely.
> > 
> > My maintainer concern here is always that we should err on the side of not
> > killing the machine. If the reset failed, or if the gpu reinit failed then
> > marking the gpu as wedged has historically been the safe option. The
> > system will still run, display mostly works and there's a reasonable
> > chance you can gather debug data.
> 
> One thing to bear in mind here is that it with this particular don't
> reset if not ready logic, repeating the attempt at reset after another
> hangcheck is equivalent to just using a slower hangcheck. (more or less,
> a couple of writes to one register difference) So it is no more likely
> to hang the machine than the original GPU hang.
> 
> We can differentiate the cases here, between say EBUSY, ENODEV, and EIO,
> from the actual the reset request to determine which we want to retry
> (i.e. EBUSY).

Tbh I don't want to make the reset code to clever with multiple fallback
paths - it's a really tricky code and as-is already suffers from imo
insufficient test coverage and too many bugs. Once we decided that the gpu
is dead and return -EIO this should be a terminal state. Developers can
always manually unwedge through debugfs, but for users it's imo paramount
that we don't automatically run some little-tested path and take down
their box in the process.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list