[Intel-gfx] [PATCH] drm/i915/gt: Confirm the context survives execution

Chris Wilson chris at chris-wilson.co.uk
Wed Oct 14 08:43:10 UTC 2020


Quoting Tvrtko Ursulin (2020-10-14 09:36:08)
> 
> On 13/10/2020 16:35, Chris Wilson wrote:
> > Repeat our sanitychecks from before execution to after execution. One
> > expects that if we were to see these, the gpu would already be on fire,
> > but the timing may be informative.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 10 +++++++---
> >   1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 287537089c77..3dbdd5d0cb60 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1216,7 +1216,8 @@ static void intel_engine_context_out(struct intel_engine_cs *engine)
> >   
> >   static void
> >   execlists_check_context(const struct intel_context *ce,
> > -                     const struct intel_engine_cs *engine)
> > +                     const struct intel_engine_cs *engine,
> > +                     const char *when)
> >   {
> >       const struct intel_ring *ring = ce->ring;
> >       u32 *regs = ce->lrc_reg_state;
> > @@ -1251,7 +1252,7 @@ execlists_check_context(const struct intel_context *ce,
> >               valid = false;
> >       }
> >   
> > -     WARN_ONCE(!valid, "Invalid lrc state found before submission\n");
> > +     WARN_ONCE(!valid, "Invalid lrc state found %s submission\n", when);
> >   }
> >   
> >   static void restore_default_state(struct intel_context *ce,
> > @@ -1347,7 +1348,7 @@ __execlists_schedule_in(struct i915_request *rq)
> >               reset_active(rq, engine);
> >   
> >       if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> > -             execlists_check_context(ce, engine);
> > +             execlists_check_context(ce, engine, "before");
> >   
> >       if (ce->tag) {
> >               /* Use a fixed tag for OA and friends */
> > @@ -1418,6 +1419,9 @@ __execlists_schedule_out(struct i915_request *rq,
> >        * refrain from doing non-trivial work here.
> >        */
> >   
> > +     if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> > +             execlists_check_context(ce, engine, "after");
> > +
> 
> CI failures here are either something super scary or a simple mistake 
> which I cannot see. Or is engine retire, possible queued up before, 
> racing with current schedule_out?

It's the unpark while the process_csb is not yet flushed, so we scrub
the kernel_context before it is scheduled-out. It could in theory be a
real problem with our scrubbing to simulate an issue causing an issue,
but the timing is quite slim.
-Chris


More information about the Intel-gfx mailing list