[Intel-gfx] [PATCH 4/4] drm/i915/gem: Always test execution status on closing the context
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Sep 25 13:23:32 UTC 2020
On 25/09/2020 11:05, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-09-24 15:26:56)
>>
>> On 16/09/2020 10:42, Chris Wilson wrote:
>>> Verify that if a context is active at the time it is closed, that it is
>>> either persistent and preemptible (with hangcheck running) or it shall
>>> be removed from execution.
>>>
>>> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
>>> Testcase: igt/gem_ctx_persistence/heartbeat-close
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>> Cc: <stable at vger.kernel.org> # v5.7+
>>> ---
>>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 48 +++++----------------
>>> 1 file changed, 10 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index a548626fa8bc..4fd38101bb56 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -390,24 +390,6 @@ __context_engines_static(const struct i915_gem_context *ctx)
>>> return rcu_dereference_protected(ctx->engines, true);
>>> }
>>>
>>> -static bool __reset_engine(struct intel_engine_cs *engine)
>>> -{
>>> - struct intel_gt *gt = engine->gt;
>>> - bool success = false;
>>> -
>>> - if (!intel_has_reset_engine(gt))
>>> - return false;
>>> -
>>> - if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
>>> - >->reset.flags)) {
>>> - success = intel_engine_reset(engine, NULL) == 0;
>>> - clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
>>> - >->reset.flags);
>>> - }
>>> -
>>> - return success;
>>> -}
>>> -
>>> static void __reset_context(struct i915_gem_context *ctx,
>>> struct intel_engine_cs *engine)
>>> {
>>> @@ -431,12 +413,7 @@ static bool __cancel_engine(struct intel_engine_cs *engine)
>>> * kill the banned context, we fallback to doing a local reset
>>> * instead.
>>> */
>>> - if (IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT) &&
>>> - !intel_engine_pulse(engine))
>>> - return true;
>>> -
>>> - /* If we are unable to send a pulse, try resetting this engine. */
>>> - return __reset_engine(engine);
>>> + return intel_engine_pulse(engine) == 0;
>>
>> Where is the path now which actually resets the currently running
>> workload (engine) of a non-persistent context? Pulse will be sent and
>> then rely on hangcheck but otherwise let it run?
>
> If the pulse fails, we just call intel_handle_error() on the engine. On
> looking at this code again, I could not justify the open-coding of the
> engine reset fragment of the general error handler, especially as we end
> up taking that route anyway for device resets. (Unlike inside the
> tasklet, there's no atomicity concerns on this engine-reset path.)
I think yesterday I got lost in boolean logic and flows here. Today it
looks fine. Bool ban will be true and code will indeed enter the
__kill_context path.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list