[Intel-gfx] [PATCH] drm/i915: Treat i915_reset_engine() as guilty until proven innocent
Michel Thierry
michel.thierry at intel.com
Fri Apr 6 21:44:34 UTC 2018
On 4/6/2018 2:30 PM, Chris Wilson wrote:
> Quoting Michel Thierry (2018-04-06 22:23:21)
>> And I thought we believed in presumption of innocence...
>>
>> On 4/6/2018 2:00 PM, Chris Wilson wrote:
>>> If we are resetting just one engine, we know it has stalled. So we can
>>> pass the stalled parameter directly to i915_gem_reset_engine(), which
>>> alleviates the necessity to poke at the generic engine->hangcheck.stalled
>>> magic variable, leaving that under control of hangcheck as its name
>>> implies. Other than simplifying by removing the indirect parameter along
>>> this path, this allows us to introduce new reset mechanisms that run
>>> independently of hangcheck.
>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>> Cc: Jeff McGee <jeff.mcgee at intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_drv.c | 2 +-
>>> drivers/gpu/drm/i915/i915_drv.h | 3 +-
>>> drivers/gpu/drm/i915/i915_gem.c | 36 +++++++++----------
>>> .../gpu/drm/i915/selftests/intel_hangcheck.c | 9 -----
>>> 4 files changed, 20 insertions(+), 30 deletions(-)
>>>
>> ...
>>> @@ -774,7 +766,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
>>> break;
>>> }
>>>
>>> - engine->hangcheck.stalled = false;
>>> count++;
>>>
>>> if (rq) {
>>>
>>
>> Are the ones in igt_handle_error() still needed?
>> hangcheck.stalled = true;
>> hangcheck.seqno = intel_engine_get_seqno(engine);
>>
>> Because igt_handle_error is sending a real request.
>
>> (I think the only ones remaining in the selftest should be in
>> fake_hangcheck).
>
> Right, fake_hangcheck definitely still needs it to behave like
> hangcheck.
>
> i915_handle_error() is still "odd". At the moment, yes we still need to
> be poking where we shouldn't. If i915_handle_error() uses the
> engine_mask to do per-engine resets, no, we don't need the magic
> hangcheck.stalled. But, if it falls back to the full device level, we
> loose the guilty reset. So we do get a difference in behaviour, that
> really hasn't been noticed before as the only real caller is from
> hangcheck. (i915_wedged, I dare anyone to say what they expect ;)
>
Not the first time I forget about the full reset path.
Thanks for explaining it.
> I think the answer will be to pass engine_mask to i915_reset. But I
> haven't fleshed that out yet. I think it means we do away with
> hangcheck.seqno as well, so bonus?
More information about the Intel-gfx
mailing list