[Intel-gfx] [PATCH 2/4] drm/i915: Cancel outstanding work after disabling heartbeats on an engine

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Fri Sep 25 11:04:09 UTC 2020


Quoting Chris Wilson (2020-09-16 12:42:17)
> We only allow persistent requests to remain on the GPU past the closure
> of their containing context (and process) so long as they are continuously
> checked for hangs or allow other requests to preempt them, as we need to
> ensure forward progress of the system. If we allow persistent contexts
> to remain on the system after the the hangcheck mechanism is disabled,
> the system may grind to a halt. On disabling the mechanism, we sent a
> pulse along the engine to remove all executing contexts from the engine
> which would check for hung contexts -- but we did not prevent those
> contexts from being resubmitted if they survived the final hangcheck.
> 
> Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
> Testcase: igt/gem_ctx_persistence/heartbeat-stop
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: <stable at vger.kernel.org> # v5.7+

Definitely makes sense to ensure.

Acked-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>

Regards, Joonas

> ---
>  drivers/gpu/drm/i915/gt/intel_engine.h | 9 +++++++++
>  drivers/gpu/drm/i915/i915_request.c    | 5 +++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 08e2c000dcc3..7c3a1012e702 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -337,4 +337,13 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>         return intel_engine_has_preemption(engine);
>  }
>  
> +static inline bool
> +intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> +{
> +       if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> +               return false;
> +
> +       return READ_ONCE(engine->props.heartbeat_interval_ms);
> +}
> +
>  #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 436ce368ddaa..0e813819b041 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -542,8 +542,13 @@ bool __i915_request_submit(struct i915_request *request)
>         if (i915_request_completed(request))
>                 goto xfer;
>  
> +       if (unlikely(intel_context_is_closed(request->context) &&
> +                    !intel_engine_has_heartbeat(engine)))
> +               intel_context_set_banned(request->context);
> +
>         if (unlikely(intel_context_is_banned(request->context)))
>                 i915_request_set_error_once(request, -EIO);
> +
>         if (unlikely(fatal_error(request->fence.error)))
>                 __i915_request_skip(request);
>  
> -- 
> 2.20.1
> 


More information about the Intel-gfx mailing list