[Intel-gfx] [PATCH 3/3] drm/i915: Declare the driver wedged if hangcheck makes no progress
Chris Wilson
chris at chris-wilson.co.uk
Tue Jun 5 09:33:45 UTC 2018
Quoting Mika Kuoppala (2018-06-05 10:26:37)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > Hangcheck is our back up in case the GPU or the driver gets stuck. It
> > detects when the GPU is not making any progress and issues a GPU reset.
> > However, if the driver is failing to make any progress, we can get
> > ourselves into a situation where we continually try resetting the GPU to
> > no avail. Employ a second timeout such that if we continue to see the
> > same seqno (the stalled engine has made no progress at all) over the
> > course of several hangchecks, declare the driver wedged and attempt to
> > start afresh.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_debugfs.c | 5 +++--
> > drivers/gpu/drm/i915/i915_drv.h | 2 ++
> > drivers/gpu/drm/i915/intel_hangcheck.c | 17 ++++++++++++++++-
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 3 ++-
> > 4 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 15e86d34a81c..2eae3abb0ec1 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -1359,11 +1359,12 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
> > seq_printf(m, " seqno = %x [current %x, last %x]\n",
> > engine->hangcheck.seqno, seqno[id],
> > intel_engine_last_submit(engine));
> > - seq_printf(m, " waiters? %s, fake irq active? %s, stalled? %s\n",
> > + seq_printf(m, " waiters? %s, fake irq active? %s, stalled? %s, wedged? %s\n",
> > yesno(intel_engine_has_waiter(engine)),
> > yesno(test_bit(engine->id,
> > &dev_priv->gpu_error.missed_irq_rings)),
> > - yesno(engine->hangcheck.stalled));
> > + yesno(engine->hangcheck.stalled),
> > + yesno(engine->hangcheck.wedged));
> >
> > spin_lock_irq(&b->rb_lock);
> > for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 38157df6ff5c..a4ed0baeb0ed 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -994,6 +994,8 @@ struct i915_gem_mm {
> > #define I915_ENGINE_DEAD_TIMEOUT (4 * HZ) /* Seqno, head and subunits dead */
> > #define I915_SEQNO_DEAD_TIMEOUT (12 * HZ) /* Seqno dead with active head */
> >
> > +#define I915_ENGINE_WEDGED_TIMEOUT (60 * HZ) /* Reset but no recovery? */
> > +
> > enum modeset_restore {
> > MODESET_ON_LID_OPEN,
> > MODESET_DONE,
> > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> > index d47e346bd49e..2fc7a0dd0df9 100644
> > --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> > @@ -294,6 +294,7 @@ static void hangcheck_store_sample(struct intel_engine_cs *engine,
> > engine->hangcheck.seqno = hc->seqno;
> > engine->hangcheck.action = hc->action;
> > engine->hangcheck.stalled = hc->stalled;
> > + engine->hangcheck.wedged = hc->wedged;
> > }
> >
> > static enum intel_engine_hangcheck_action
> > @@ -368,6 +369,9 @@ static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
> >
> > hc->stalled = time_after(jiffies,
> > engine->hangcheck.action_timestamp + timeout);
> > + hc->wedged = time_after(jiffies,
> > + engine->hangcheck.action_timestamp +
> > + I915_ENGINE_WEDGED_TIMEOUT);
>
> As the init_hangcheck() will clear the wedged state,
> our reset failure paths needs to steer clear of this.
They do by definition (since the reset failed :)
> Would it be better if we warn on the wedged being set
> in init paths and explicitly clear it on unset_wedged?
I don't think so, if we reset, we may as well regard the reset as
completed. We detect repeated attempts to do the same reset; it's just
we need a back in case we don't do any reset.
-Chris
More information about the Intel-gfx
mailing list