[Intel-gfx] [PATCH 2/2] drm/i915: Detect a failed GPU reset+recovery

Mon Jan 16 11:18:23 UTC 2017

On Mon, Jan 16, 2017 at 09:42:52AM +0000, Chris Wilson wrote:
> If we can't recover the GPU after the reset, mark it as wedged to cancel
> the outstanding tasks and to prevent new users from trying to use the
> broken GPU.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> -void i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
> +int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>  {
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
> +	int err = 0;
>  
>  	/* Ensure irq handler finishes, and not run again. */
> -	for_each_engine(engine, dev_priv, id)
> +	for_each_engine(engine, dev_priv, id) {
> +		struct drm_i915_gem_request *request;
> +
>  		tasklet_kill(&engine->irq_tasklet);
>  
> +		request = i915_gem_find_active_request(engine);
> +		if (request && request->fence.error == -EIO)
> +			err = -EIO; /* Previous reset failed! */

This should check that it is this engine that is declared as hung - as
we may not have given the GPU the chance to even execute the requests
from the previous reset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre