[Intel-gfx] [PATCH 4/5] drm/i915: Fixup preempt-to-busy vs reset of a virtual request
Chris Wilson
chris at chris-wilson.co.uk
Mon Sep 23 13:57:48 UTC 2019
Quoting Tvrtko Ursulin (2019-09-23 14:46:27)
>
> On 21/09/2019 10:55, Chris Wilson wrote:
> > Due to the nature of preempt-to-busy the execlists active tracking and
> > the schedule queue may become temporarily desync'ed (between resubmission
> > to HW and its ack from HW). This means that we may have unwound a
> > request and passed it back to the virtual engine, but it is still
> > inflight on the HW and may even result in a GPU hang. If we detect that
> > GPU hang and try to reset, the hanging request->engine will no longer
> > match the current engine, which means that the request is not on the
> > execlists active list and we should not try to find an older incomplete
> > request. Given that we have deduced this must be a request on a virtual
> > engine, it is the single active request in the context and so must be
> > guilty (as the context is still inflight, it is prevented from being
> > executed on another engine as we process the reset).
> >
> > Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 7 +++++--
> > drivers/gpu/drm/i915/gt/intel_reset.c | 4 +---
> > 2 files changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 1b2bacc60300..3eadd294bcd7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2326,11 +2326,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
> >
> > static struct i915_request *active_request(struct i915_request *rq)
> > {
> > - const struct list_head * const list =
> > - &i915_request_active_timeline(rq)->requests;
> > const struct intel_context * const ce = rq->hw_context;
> > struct i915_request *active = NULL;
> > + struct list_head *list;
> >
> > + if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
>
> Is it time to store the veng pointer separately in the request so we can
> add the assert here? Like
> GEM_BUG_ON(!...engine_is_virtual(rq->orig_engine)) ?
intel_engine_is_virtual(rq->hw_context->engine).
But this is currently agnostic, and here we are only asking question is
this request in the engine->active.list, which applies to both normal
and virtual setups. Hmm, this bug does apply to normal reset, I only
spotted it because we added the lockdep warnings that picked up that
rq->engine was different.
There maybe some use for that assertion, but I don't think this is one.
-Chris
More information about the Intel-gfx
mailing list