[Intel-gfx] [PATCH] drm/i915: Restore the kernel context after a GPU reset on an idle engine

Chris Wilson chris at chris-wilson.co.uk
Sat Dec 16 09:41:35 UTC 2017


Quoting Michel Thierry (2017-12-16 00:48:55)
> On 12/15/2017 4:03 PM, Chris Wilson wrote:
> > As part of the system requirement for powersaving is that we always have
> > a context loaded. Upon boot and resume, we load the kernel_context to
> > ensure that some valid state is set before powersaving kicks in, we
> > should do so after a full GPU reset as well. We only need to do so for
> > an idle engine, as any active engines will restart by executing the
> > stuck request, loading its context, for the idle engine we create a
> > new request to load the kernel_context instead.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 4a7f5579a7a5..189725a8fed6 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3119,6 +3119,15 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
> >                  ctx = fetch_and_zero(&engine->last_retired_context);
> >                  if (ctx)
> >                          engine->context_unpin(engine, ctx);
> > +
> > +               if (list_empty(&engine->timeline->requests)) {
> > +                       struct drm_i915_gem_request *rq;
> > +
> > +                       rq = i915_gem_request_alloc(engine,
> > +                                                   dev_priv->kernel_context);
> > +                       if (!IS_ERR(rq))
> > +                               __i915_add_request(rq, false);
> > +               }
> >          }
> > 
> >          i915_gem_restore_fences(dev_priv);
> 
> It shouldn't hurt and if it fixes something,
> 
> Reviewed-by: Michel Thierry <michel.thierry at intel.com>

Indeed, in that run it fixed both of the mystery hangs, pnv and bdw+.
So I added a comment that it is a mystery fix to hopefully inspire us to
find the real reason, I think I'm on the right lines with stale TLB
something like the lack of INSTPM_TLB_INVALIDATE, and in the meantime to
remember not to remove it.

Thanks for the quick review, I've pushed this and its companion to avoid
resetting the idle engine for per-engine resets. (And fingers crossed
the bandaid holds.)
-Chris


More information about the Intel-gfx mailing list