[Intel-gfx] [PATCH 05/20] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress
Chris Wilson
chris at chris-wilson.co.uk
Wed Jan 13 12:49:39 PST 2016
On Wed, Jan 13, 2016 at 05:28:17PM +0000, Arun Siluvery wrote:
> From: Tomas Elf <tomas.elf at intel.com>
>
> i915_gem_wedge now returns a non-zero result in three different cases:
>
> 1. Legacy: A hang has been detected and full GPU reset is in progress.
>
> 2. Per-engine recovery:
>
> a. A single engine reference can be passed to the function, in which
> case only that engine will be checked. If that particular engine is
> detected to be hung and is to be reset this will yield a non-zero
> result but not if reset is in progress for any other engine.
>
> b. No engine reference is passed to the function, in which case all
> engines are checked for ongoing per-engine hang recovery.
>
> Also, i915_wait_request was updated to take advantage of this new
> functionality. This is important since the TDR hang recovery mechanism needs a
> way to force waiting threads that hold the struct_mutex to give up the
> struct_mutex and try again after the hang recovery has completed. If
> i915_wait_request does not take per-engine hang recovery into account there is
> no way for a waiting thread to know that a per-engine recovery is about to
> happen and that it needs to back off.
>
> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery at intel.com>
> Signed-off-by: Ian Lister <ian.lister at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 3 +-
> drivers/gpu/drm/i915/i915_gem.c | 60 +++++++++++++++++++++++++++------
> drivers/gpu/drm/i915/intel_lrc.c | 4 +--
> drivers/gpu/drm/i915/intel_ringbuffer.c | 4 ++-
> 4 files changed, 56 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 85cf692..5be7d3e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3033,7 +3033,8 @@ i915_gem_find_active_request(struct intel_engine_cs *ring);
>
> bool i915_gem_retire_requests(struct drm_device *dev);
> void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
> -int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
> +int __must_check i915_gem_check_wedge(struct drm_i915_private *dev_priv,
> + struct intel_engine_cs *engine,
> bool interruptible);
>
> static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e3cfed2..e6eb45d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -80,12 +80,38 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
> spin_unlock(&dev_priv->mm.object_stat_lock);
> }
>
> +static inline int
> +i915_engine_reset_in_progress(struct drm_i915_private *dev_priv,
> + struct intel_engine_cs *engine)
> +{
> + int ret = 0;
> +
> + if (engine) {
> + ret = !!(atomic_read(&dev_priv->ring[engine->id].hangcheck.flags)
> + & I915_ENGINE_RESET_IN_PROGRESS);
> + } else {
> + int i;
> +
> + for (i = 0; i < I915_NUM_RINGS; i++)
> + if (atomic_read(&dev_priv->ring[i].hangcheck.flags)
> + & I915_ENGINE_RESET_IN_PROGRESS) {
> +
> + ret = 1;
> + break;
> + }
> + }
Since this side will be called far more often than the writer, could you
not make this more convenient for the reader and move it to a global set
of flags in dev_priv->gpu_error?
To avoid regressing on the EIO front, the waiter sequence should look
like
if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
return 0;
if (flags & LOCKED && i915_engine_reset_in_process(&req->i915->gpu_error, req->engine))
return -EAGAIN;
Oh, and don't add a second boolean to __i915_wait_request, just
transform the first into flags.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list