[Intel-gfx] [CI 1/5] drm/i915/gt: Try to more gracefully quiesce the system before resets

Chris Wilson chris at chris-wilson.co.uk
Thu Oct 24 06:14:56 UTC 2019


Quoting Ramalingam C (2019-10-24 02:32:01)
> On 2019-10-23 at 14:31:04 +0100, Chris Wilson wrote:
> > If we are doing a normal GPU reset triggered after detecting a long
> > period of stalled work, we can take our time and allow the engines to
> > quiesce. Since we've stopped submission to the engine, and if we wait
> > long enough an innocent context should complete, leaving the engine idle.
> > So by waiting a short amount of time, we should prevent clobbering other
> > users when resetting a stuck context.
> > 
> > Suggested-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Suggested-by: Jon Bloomfield <jon.bloomfield at intel.com>
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
> >  3 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > index 48df8889a88a..3a3881d5e44b 100644
> > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> >         May be 0 to disable the initial spin. In practice, we estimate
> >         the cost of enabling the interrupt (if currently disabled) to be
> >         a few microseconds.
> > +
> > +config DRM_I915_STOP_TIMEOUT
> > +     int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > +     default 100 # milliseconds
> > +     help
> > +       By stopping submission and sleeping for a short time before resetting
> > +       the GPU, we allow the innocent contexts also on the system to quiesce.
> > +       It is then less likely for a hanging context to cause collateral
> > +       damage as the system is reset in order to recover. The corollary is
> > +       that the reset itself may take longer and so be more disruptive to
> > +       interactive or low latency workloads.
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 0e20713603ec..e4203eb44139 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> >       engine->instance = info->instance;
> >       __sprint_engine_name(engine);
> >  
> > +     engine->props.stop_timeout_ms =
> > +             CONFIG_DRM_I915_STOP_TIMEOUT;
> Compare to previous version where you used the CONFIG variable directly,
> what is the benefit of using it through a variable? So that we could
> alter it in runtime?

That is what the next series proposes. Also it turns out to be more
easily testable if one is able to alter the timeouts for testing.
-Chris


More information about the Intel-gfx mailing list