[Intel-gfx] [PATCH] drm/i915: Don't unwedge if reset is disabled

Chris Wilson chris at chris-wilson.co.uk
Mon Sep 9 21:48:42 UTC 2019


Quoting Chris Wilson (2019-09-07 09:39:52)
> Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> > 
> > 
> > On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> > > When trying to reset a device with reset capability disabled or not
> > > supported while rings are full of requests, it has been observed when
> > > running in execlists submission mode that command stream buffer tail
> > > tends to be incremented by apparently still running GPU regardless of
> > > all requests being already cancelled and command stream buffer pointers
> > > reset.  As a result, kernel panic on NULL pointer dereference occurs
> > > when a trace_ports() helper is called with command stream buffer tail
> > > incremented but request pointers being NULL during final
> > > __intel_gt_set_wedged() operation called from intel_gt_reset().
> > > 
> > > Skip actual reset procedure if reset is disabled or not supported.
> > 
> > This last sentence is a bit confusing. You're not skipping the reset 
> > procedure, you're skipping the attempt of unwedging and resetting again 
> > after a reset & wedge already happened.
> 
> Loss of email over the last week, so jumping in at the end. My gut
> response is that this is still just papering over the bug, as what you
> say above makes no sense.

So my gut response was to the run on sentence, when all you needed to
say that without a successful reset prior to calling
reset_default_submission, the engine may still generate CS events out of
the blue. And I think the patch should be written to require the
successful reset.
-Chris


More information about the Intel-gfx mailing list