[Intel-gfx] [PATCH] drm/i915: Treat i915_reset_engine() as guilty until proven innocent
Chris Wilson
chris at chris-wilson.co.uk
Fri Apr 6 21:30:41 UTC 2018
Quoting Michel Thierry (2018-04-06 22:23:21)
> And I thought we believed in presumption of innocence...
>
> On 4/6/2018 2:00 PM, Chris Wilson wrote:
> > If we are resetting just one engine, we know it has stalled. So we can
> > pass the stalled parameter directly to i915_gem_reset_engine(), which
> > alleviates the necessity to poke at the generic engine->hangcheck.stalled
> > magic variable, leaving that under control of hangcheck as its name
> > implies. Other than simplifying by removing the indirect parameter along
> > this path, this allows us to introduce new reset mechanisms that run
> > independently of hangcheck.
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry at intel.com>
> > Cc: Jeff McGee <jeff.mcgee at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_drv.c | 2 +-
> > drivers/gpu/drm/i915/i915_drv.h | 3 +-
> > drivers/gpu/drm/i915/i915_gem.c | 36 +++++++++----------
> > .../gpu/drm/i915/selftests/intel_hangcheck.c | 9 -----
> > 4 files changed, 20 insertions(+), 30 deletions(-)
> >
> ...
> > @@ -774,7 +766,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
> > break;
> > }
> >
> > - engine->hangcheck.stalled = false;
> > count++;
> >
> > if (rq) {
> >
>
> Are the ones in igt_handle_error() still needed?
> hangcheck.stalled = true;
> hangcheck.seqno = intel_engine_get_seqno(engine);
>
> Because igt_handle_error is sending a real request.
> (I think the only ones remaining in the selftest should be in
> fake_hangcheck).
Right, fake_hangcheck definitely still needs it to behave like
hangcheck.
i915_handle_error() is still "odd". At the moment, yes we still need to
be poking where we shouldn't. If i915_handle_error() uses the
engine_mask to do per-engine resets, no, we don't need the magic
hangcheck.stalled. But, if it falls back to the full device level, we
loose the guilty reset. So we do get a difference in behaviour, that
really hasn't been noticed before as the only real caller is from
hangcheck. (i915_wedged, I dare anyone to say what they expect ;)
I think the answer will be to pass engine_mask to i915_reset. But I
haven't fleshed that out yet. I think it means we do away with
hangcheck.seqno as well, so bonus?
-Chris
More information about the Intel-gfx
mailing list