[Intel-gfx] [PATCH 3/7] drm/i915/tdr: Add support for per engine reset recovery

Chris Wilson chris at chris-wilson.co.uk
Mon Sep 19 19:53:55 UTC 2016


On Mon, Sep 19, 2016 at 04:30:15PM +0100, Matthew Auld wrote:
> From: "arun.siluvery at linux.intel.com" <arun.siluvery at linux.intel.com>
> 
> This change implements support for per-engine reset as an initial, less
> intrusive hang recovery option to be attempted before falling back to the
> legacy full GPU reset recovery mode if necessary. This is only supported
> from Gen8 onwards.
> 
> Hangchecker determines which engines are hung and invokes error handler to
> recover from it. Error handler schedules recovery for each of those engines
> that are hung. The recovery procedure is as follows,
>  - identifies the request that caused the hang and it is dropped
>  - force engine to idle: this is done by issuing a reset request
>  - reset and re-init engine
>  - restart submissions to the engine
> 
> If engine reset fails then we fall back to heavy weight full gpu reset
> which resets all engines and reinitiazes complete state of HW and SW.
> 
> v2
>   - rebase
> 
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c     | 59 +++++++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_drv.h     |  3 ++
>  drivers/gpu/drm/i915/i915_gem.c     |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c    | 10 +++++++
>  drivers/gpu/drm/i915/intel_lrc.h    |  1 +
>  drivers/gpu/drm/i915/intel_uncore.c | 41 +++++++++++++++++++++++---
>  6 files changed, 105 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 99fa690..8625207 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1812,21 +1812,68 @@ error:
>   * Returns zero on successful reset or otherwise an error code.
>   *
>   * Procedure is fairly simple:
> - *  - force engine to idle
> - *  - save current state which includes head and current request
> - *  - reset engine
> - *  - restore saved state and resubmit context
> + *    - identifies the request that caused the hang and it is dropped
> + *    - force engine to idle: this is done by issuing a reset request
> + *    - reset engine
> + *    - restart submissions to the engine
>   */
>  int i915_reset_engine(struct intel_engine_cs *engine)
>  {
>  	int ret;
>  	struct drm_i915_private *dev_priv = engine->i915;
>  
> -	/* FIXME: replace me with engine reset sequence */
> -	ret = -ENODEV;
> +	/*
> +	 * We need to first idle the engine by issuing a reset request,
> +	 * then perform soft reset and re-initialize hw state, for all of
> +	 * this GT power need to be awake so ensure it does throughout the
> +	 * process
> +	 */
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +
> +	/*
> +	 * the request that caused the hang is stuck on elsp, identify the
> +	 * active request and drop it, adjust head to skip the offending
> +	 * request to resume executing remaining requests in the queue.
> +	 */
> +	i915_gem_reset_engine(engine);
> +
> +	ret = intel_engine_reset_begin(engine);
> +	if (ret) {
> +		DRM_ERROR("Failed to disable %s\n", engine->name);
> +		goto error;
> +	}
> +
> +	ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> +	if (ret) {
> +		DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
> +		intel_engine_reset_cancel(engine);
> +		goto error;
> +	}

Ordering is still broken.

> +
> +	ret = engine->init_hw(engine);
> +	if (ret)
> +		goto error;
> +
> +	intel_engine_reset_cancel(engine);
> +	intel_execlists_restart_submission(engine);

This is broken.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list