[Intel-gfx] [PATCH] drm/i915: Don't unwedge if reset is disabled

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Fri Sep 6 22:28:05 UTC 2019



On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> When trying to reset a device with reset capability disabled or not
> supported while rings are full of requests, it has been observed when
> running in execlists submission mode that command stream buffer tail
> tends to be incremented by apparently still running GPU regardless of
> all requests being already cancelled and command stream buffer pointers
> reset.  As a result, kernel panic on NULL pointer dereference occurs
> when a trace_ports() helper is called with command stream buffer tail
> incremented but request pointers being NULL during final
> __intel_gt_set_wedged() operation called from intel_gt_reset().
> 
> Skip actual reset procedure if reset is disabled or not supported.

This last sentence is a bit confusing. You're not skipping the reset 
procedure, you're skipping the attempt of unwedging and resetting again 
after a reset & wedge already happened.

> 
> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_reset.c | 26 ++++++++++++++++++--------
>   1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index b9d84d52e986..d75da124e280 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -932,25 +932,35 @@ void intel_gt_reset(struct intel_gt *gt,
>   	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
>   	mutex_lock(&gt->reset.mutex);
>   
> -	/* Clear any previous failed attempts at recovery. Time to try again. */
> -	if (!__intel_gt_unset_wedged(gt))
> -		goto unlock;
> -

Since you're anyway checking the wedged status and reset support 
multiple times, wouldn't it have been better to just add a single check 
at the beginning? e.g.

	/* we can't recover a wedged GT without reset */
	if (!intel_has_gpu_reset(gt->i915) && intel_gt_is_wedged(gt))
		goto unlock;

Daniele

>   	if (reason)
>   		dev_notice(gt->i915->drm.dev,
>   			   "Resetting chip for %s\n", reason);
> -	atomic_inc(&gt->i915->gpu_error.reset_count);
> -
> -	awake = reset_prepare(gt);
>   
>   	if (!intel_has_gpu_reset(gt->i915)) {
>   		if (i915_modparams.reset)
>   			dev_err(gt->i915->drm.dev, "GPU reset not supported\n");
>   		else
>   			DRM_DEBUG_DRIVER("GPU reset disabled\n");
> -		goto error;
> +
> +		/*
> +		 * Don't unwedge if reset is disabled or not supported
> +		 * because we can't guarantee what the hardware status is.
> +		 */
> +		if (intel_gt_is_wedged(gt))
> +			goto unlock;
>   	}
>   
> +	/* Clear any previous failed attempts at recovery. Time to try again. */
> +	if (!__intel_gt_unset_wedged(gt))
> +		goto unlock;
> +
> +	atomic_inc(&gt->i915->gpu_error.reset_count);
> +
> +	awake = reset_prepare(gt);
> +
> +	if (!intel_has_gpu_reset(gt->i915))
> +		goto error;
> +
>   	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
>   		intel_runtime_pm_disable_interrupts(gt->i915);
>   
> 


More information about the Intel-gfx mailing list