[Intel-gfx] [PATCH 3/5] drm/i915/execlists: Process interrupted context on reset
Chris Wilson
chris at chris-wilson.co.uk
Wed Jul 17 13:40:26 UTC 2019
Quoting Tvrtko Ursulin (2019-07-17 14:31:00)
>
> On 16/07/2019 13:49, Chris Wilson wrote:
> > By stopping the rings, we may trigger an arbitration point resulting in
> > a premature context-switch (i.e. a completion event before the request
> > is actually complete). This clears the active context before the reset,
> > but we must remember to rewind the incomplete context for replay upon
> > resume.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 9b87a2fc186c..7570a9256001 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1419,7 +1419,8 @@ static void process_csb(struct intel_engine_cs *engine)
> > * coherent (visible from the CPU) before the
> > * user interrupt and CSB is processed.
> > */
> > - GEM_BUG_ON(!i915_request_completed(*execlists->active));
> > + GEM_BUG_ON(!i915_request_completed(*execlists->active) &&
> > + !reset_in_progress(execlists));
> > execlists_schedule_out(*execlists->active++);
> >
> > GEM_BUG_ON(execlists->active - execlists->inflight >
> > @@ -2254,7 +2255,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
> > */
> > rq = execlists_active(execlists);
> > if (!rq)
> > - return;
> > + goto unwind;
> >
> > ce = rq->hw_context;
> > GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > @@ -2331,6 +2332,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
> > intel_ring_update_space(ce->ring);
> > __execlists_update_reg_state(ce, engine);
> >
> > +unwind:
> > /* Push back any incomplete requests for replay after the reset. */
> > __unwind_incomplete_requests(engine);
> > }
> >
>
> Sounds plausible.
>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> Shouldn't there be a Fixes: tag to go with it?
Yeah, it's rare even by our standards, I think there's a live_hangcheck
failure about once a month that could be the result of this. However,
the result would be an unrecoverable GPU hang as each attempt at
resetting would not see the missing request and so it would remain
perpetually in the engine->active.list until a set-wedged (i.e. suspend
in the user case).
-Chris
More information about the Intel-gfx
mailing list