[Intel-gfx] [PATCH] drm/i915: Clear lost context-switch interrupts across reset

Wed Aug 9 08:50:04 UTC 2017

Quoting Dong, Chuanxiao (2017-08-09 01:26:00)
> > -----Original Message-----
> > From: Thierry, Michel
> > Sent: Tuesday, August 8, 2017 1:40 AM
> > To: Chris Wilson <chris at chris-wilson.co.uk>; intel-gfx at lists.freedesktop.org
> > Cc: Dong, Chuanxiao <chuanxiao.dong at intel.com>; Ursulin, Tvrtko
> > <tvrtko.ursulin at intel.com>; Winiarski, Michal <michal.winiarski at intel.com>
> > Subject: Re: [PATCH] drm/i915: Clear lost context-switch interrupts across
> > reset
> > 
> > On 8/7/2017 5:19 AM, Chris Wilson wrote:
> > > During a global reset, we disable the irq. As we disable the irq, the
> > > hardware may be raising a GT interrupt that we then ignore, leaving it
> > > pending in the GTIIR. After the reset, we then re-enable the irq,
> > > triggering the pending interrupt. However, that interrupt was for the
> > > stale state from before the reset, and the contents of the CSB buffer
> > > are now invalid.
> > >
> > > Reported-by: "Dong, Chuanxiao" <chuanxiao.dong at intel.com>
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: "Dong, Chuanxiao" <chuanxiao.dong at intel.com>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > Cc: Michal Winiarski <michal.winiarski at intel.com>
> > > Cc: Michel Thierry <michel.thierry at intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/intel_lrc.c | 17 ++++++++++++++++-
> > >   1 file changed, 16 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > > b/drivers/gpu/drm/i915/intel_lrc.c
> > > index b0738d2b2a7f..bc61948e2601 100644
> > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > @@ -1221,6 +1221,14 @@ static int intel_init_workaround_bb(struct
> > intel_engine_cs *engine)
> > >     return ret;
> > >   }
> > >
> > > +static u8 gtiir[] = {
> > > +   [RCS] = 0,
> > > +   [BCS] = 0,
> > > +   [VCS] = 1,
> > > +   [VCS2] = 1,
> > > +   [VECS] = 3,
> > > +};
> > > +
> > >   static int gen8_init_common_ring(struct intel_engine_cs *engine)
> > >   {
> > >     struct drm_i915_private *dev_priv = engine->i915; @@ -1245,9
> > > +1253,16 @@ static int gen8_init_common_ring(struct intel_engine_cs
> > > *engine)
> > >
> > >     DRM_DEBUG_DRIVER("Execlists enabled for %s\n", engine->name);
> > >
> > > -   /* After a GPU reset, we may have requests to replay */
> > > +   GEM_BUG_ON(engine->id >= ARRAY_SIZE(gtiir));
> > > +
> > > +   /* Clear any pending interrupt state */
> > > +   I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]),
> > > +              GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift);
> > > +   I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]),
> > > +              GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift);
> > 
> > Clear twice? Or the second was supposed to be user_interrupt?

I never know which of the IIR are double buffered, so in case they
decide to double buffer GTIIR, I cleared it twice.

We do want to service the user_interrupt even after the hang in case we
don't notify it during reset processing. Might be worth kicking the
engine just in case, but we have contingency plans.
-Chris