[Intel-gfx] [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands

Chris Wilson chris at chris-wilson.co.uk
Thu Nov 2 08:43:46 UTC 2017


Quoting Jeff McGee (2017-11-01 20:41:13)
> On Wed, Nov 01, 2017 at 01:58:04PM +0000, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-10-31 22:53:09)
> > > This patch adds per engine reset and recovery (TDR) support when GuC is
> > > used to submit workloads to GPU.
> > > 
> > > In the case of i915 directly submission to ELSP, driver manages hang
> > > detection, recovery and resubmission. With GuC submission these tasks
> > > are shared between driver and GuC. i915 is still responsible for detecting
> > > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > > internally manages acquiring forcewake and idling the engine before
> > > resetting it.
> > > 
> > > Once the reset is successful, i915 takes over again and handles the
> > > resubmission. The scheduler in i915 knows which requests are pending so
> > > after resetting a engine, pending workloads/requests are resubmitted
> > > again.
> > > 
> > > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > > non-guc function names.
> > > 
> > > v3: Removed debug message about engine restarting from which request,
> > > since the new baseline do it regardless of submission mode. (Chris)
> > > 
> > > v4: Rebase.
> > > 
> > > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > > tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> > > 
> > > v6: Rename the existing reset engine function and share a similar
> > > interface between guc and non-guc paths (Chris).
> > > 
> > > Signed-off-by: Michel Thierry <michel.thierry at intel.com>
> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c       | 15 +++++++++++++--
> > >  drivers/gpu/drm/i915/i915_drv.h       |  2 ++
> > >  drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
> > >  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
> > >  drivers/gpu/drm/i915/intel_uncore.c   |  5 -----
> > >  5 files changed, 40 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index af745749509c..359333a423cf 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> > >         goto finish;
> > >  }
> > >  
> > > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> > > +                                       struct intel_engine_cs *engine)
> > > +{
> > > +       return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > > +}
> > > +
> > >  /**
> > >   * i915_reset_engine - reset GPU engine to recover from a hang
> > >   * @engine: engine to reset
> > > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> > >                 goto out;
> > >         }
> > >  
> > > -       ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > > +       if (!engine->i915->guc.execbuf_client)
> > > +               ret = intel_gt_reset_engine(engine->i915, engine);
> > > +       else
> > > +               ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> > > +
> > >         if (ret) {
> > >                 /* If we fail here, we expect to fallback to a global reset */
> > > -               DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > > +               DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > > +                                (engine->i915->guc.execbuf_client ? "GUC ":""),
> > 
> > A bit overkill on the parentheses there ;)
> > 
> > Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> > interaction?
> > -Chris
> 
> There is one small change needed in the GuC preemption protocol to make it
> compatible with GuC engine reset. I will send that shortly.
> 
> There are also a couple of corner case bugs with engine reset in our current
> firmware versions. We are planning a firmware update to address those. But
> the host-side code here is fine. So...
> 
> Reviewed-by: Jeff McGee <jeff.mcgee at intel.com>

Pushed this series, along with the preempt interaction fix.
Thanks,
-Chris


More information about the Intel-gfx mailing list