[Intel-gfx] [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends

Fri May 25 13:17:38 UTC 2018

Quoting Mika Kuoppala (2018-05-25 14:13:19)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > As we reset the GPU on suspend/resume, we also do need to reset the
> > engine state tracking so call into the engine backends. This is
> > especially important so that we can also sanitize the state tracking
> > across resume.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 7b5544efa0ba..5a7e0b388ad0 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
> >  
> >  void i915_gem_sanitize(struct drm_i915_private *i915)
> >  {
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +
> > +     GEM_TRACE("\n");
> > +
> >       mutex_lock(&i915->drm.struct_mutex);
> > +
> > +     intel_runtime_pm_get(i915);
> > +     intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> > +
> > +     /*
> > +      * As we have just resumed the machine and woken the device up from
> > +      * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> > +      * back to defaults, recovering from whatever wedged state we left it
> > +      * in and so worth trying to use the device once more.
> > +      */
> >       if (i915_terminally_wedged(&i915->gpu_error))
> >               i915_gem_unset_wedged(i915);
> >  
> > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
> >       if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
> >               WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
> >  
> > +     /* Reset the submission backend after resume as well as the GPU reset */
> > +     for_each_engine(engine, i915, id) {
> > +             if (engine->reset.reset)
> > +                     engine->reset.reset(engine, NULL);
> > +     }
> 
> The NULL guarantees that it wont try to do any funny things
> with the incomplete state.

The NULL is there because this gets called really, really early before
we've finished setting up the engines.

> But what guarantees the the timeline cleanup so that
> we don't endup unwinding incomplete requests crap?

To get here we must have gone through at least the start of a suspend.
So we've already cleaned everything up; nicely or forcefully though a
wedge. Whatever is here is garbage, including any internal knowledge in
the backend about what state we left the machine in.
-Chris