[PATCH 10/14] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress

Chris Wilson chris at chris-wilson.co.uk
Fri Jun 3 08:50:21 UTC 2016


On Fri, Jun 03, 2016 at 09:29:44AM +0100, Arun Siluvery wrote:
> i915_gem_check_wedge now returns a non-zero result in three different cases:
> 
> 1. Legacy: A hang has been detected and full GPU reset is in progress.
> 
> 2. Per-engine recovery:
>    a. A single engine reference can be passed to the function, in which
>    case only that engine will be checked. If that particular engine is
>    detected to be hung and is to be reset this will yield a non-zero result
>    but not if reset is in progress for any other engine.
> 
>    b. No engine reference is passed to the function, in which case all
>    engines are checked for ongoing per-engine hang recovery.
> 
> __i915_wait_request() is updated such that if an engine reset is pending,
> we request the waiter to try again so that engine recovery can continue.
> If i915_wait_request does not take per-engine hang recovery into account
> there is no way for a waiting thread to know that a per-engine recovery is
> about to happen and that it needs to back off.

> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> Signed-off-by: Ian Lister <ian.lister at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 43 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f9773ac..6cbbb9f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -79,12 +79,31 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
>  	spin_unlock(&dev_priv->mm.object_stat_lock);
>  }
>  
> +static bool i915_engine_reset_pending(struct i915_gpu_error *error,
> +				     struct intel_engine_cs *engine)
> +{
> +	int i;
> +
> +	if (engine)
> +		return i915_engine_reset_in_progress(error, engine->id);
> +
> +	for (i = 0; i < I915_NUM_ENGINES; ++i) {
> +		if (i915_engine_reset_in_progress(error, i))
> +			return true;
> +	}

No. Expresss what we acually want which is

gpu_error {
	wait_queue_head_t waiters;
	atomic_t reset_pending;
}


Then whenever reset requires the mutex, it pokes reset_pending +
wake_up_all(waiters), then waits for the mutex.

i.e. drop the reset bit encoding from the global reset counter.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx-trybot mailing list