[Intel-gfx] [PATCH] drm/i915: Treat i915_reset_engine() as guilty until proven innocent

Chris Wilson chris at chris-wilson.co.uk
Fri Apr 6 22:05:03 UTC 2018


Quoting Michel Thierry (2018-04-06 22:44:34)
> On 4/6/2018 2:30 PM, Chris Wilson wrote:
> > Quoting Michel Thierry (2018-04-06 22:23:21)
> >> And I thought we believed in presumption of innocence...
> >>
> >> On 4/6/2018 2:00 PM, Chris Wilson wrote:
> >>> If we are resetting just one engine, we know it has stalled. So we can
> >>> pass the stalled parameter directly to i915_gem_reset_engine(), which
> >>> alleviates the necessity to poke at the generic engine->hangcheck.stalled
> >>> magic variable, leaving that under control of hangcheck as its name
> >>> implies. Other than simplifying by removing the indirect parameter along
> >>> this path, this allows us to introduce new reset mechanisms that run
> >>> independently of hangcheck.
> >>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Michel Thierry <michel.thierry at intel.com>
> >>> Cc: Jeff McGee <jeff.mcgee at intel.com>
> >>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_drv.c               |  2 +-
> >>>    drivers/gpu/drm/i915/i915_drv.h               |  3 +-
> >>>    drivers/gpu/drm/i915/i915_gem.c               | 36 +++++++++----------
> >>>    .../gpu/drm/i915/selftests/intel_hangcheck.c  |  9 -----
> >>>    4 files changed, 20 insertions(+), 30 deletions(-)
> >>>
> >> ...
> >>> @@ -774,7 +766,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
> >>>                                break;
> >>>                        }
> >>>    
> >>> -                     engine->hangcheck.stalled = false;
> >>>                        count++;
> >>>    
> >>>                        if (rq) {
> >>>
> >>
> >> Are the ones in igt_handle_error() still needed?
> >>     hangcheck.stalled = true;
> >>     hangcheck.seqno = intel_engine_get_seqno(engine);
> >>
> >> Because igt_handle_error is sending a real request.
> > 
> >> (I think the only ones remaining in the selftest should be in
> >> fake_hangcheck).
> > 
> > Right, fake_hangcheck definitely still needs it to behave like
> > hangcheck.
> > 
> > i915_handle_error() is still "odd". At the moment, yes we still need to
> > be poking where we shouldn't. If i915_handle_error() uses the
> > engine_mask to do per-engine resets, no, we don't need the magic
> > hangcheck.stalled. But, if it falls back to the full device level, we
> > loose the guilty reset.  So we do get a difference in behaviour, that
> > really hasn't been noticed before as the only real caller is from
> > hangcheck. (i915_wedged, I dare anyone to say what they expect ;)
> > 
> 
> Not the first time I forget about the full reset path.
> Thanks for explaining it.
> 
> > I think the answer will be to pass engine_mask to i915_reset. But I
> > haven't fleshed that out yet. I think it means we do away with
> > hangcheck.seqno as well, so bonus?

Wasn't quite as hairy as I feared... At least the selftests were easy
enough to convert over!
-Chris


More information about the Intel-gfx mailing list