[Intel-gfx] [PATCH] drm/i915: Don't unwedge if reset is disabled
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Fri Sep 6 22:28:05 UTC 2019
On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> When trying to reset a device with reset capability disabled or not
> supported while rings are full of requests, it has been observed when
> running in execlists submission mode that command stream buffer tail
> tends to be incremented by apparently still running GPU regardless of
> all requests being already cancelled and command stream buffer pointers
> reset. As a result, kernel panic on NULL pointer dereference occurs
> when a trace_ports() helper is called with command stream buffer tail
> incremented but request pointers being NULL during final
> __intel_gt_set_wedged() operation called from intel_gt_reset().
>
> Skip actual reset procedure if reset is disabled or not supported.
This last sentence is a bit confusing. You're not skipping the reset
procedure, you're skipping the attempt of unwedging and resetting again
after a reset & wedge already happened.
>
> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_reset.c | 26 ++++++++++++++++++--------
> 1 file changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index b9d84d52e986..d75da124e280 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -932,25 +932,35 @@ void intel_gt_reset(struct intel_gt *gt,
> GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, >->reset.flags));
> mutex_lock(>->reset.mutex);
>
> - /* Clear any previous failed attempts at recovery. Time to try again. */
> - if (!__intel_gt_unset_wedged(gt))
> - goto unlock;
> -
Since you're anyway checking the wedged status and reset support
multiple times, wouldn't it have been better to just add a single check
at the beginning? e.g.
/* we can't recover a wedged GT without reset */
if (!intel_has_gpu_reset(gt->i915) && intel_gt_is_wedged(gt))
goto unlock;
Daniele
> if (reason)
> dev_notice(gt->i915->drm.dev,
> "Resetting chip for %s\n", reason);
> - atomic_inc(>->i915->gpu_error.reset_count);
> -
> - awake = reset_prepare(gt);
>
> if (!intel_has_gpu_reset(gt->i915)) {
> if (i915_modparams.reset)
> dev_err(gt->i915->drm.dev, "GPU reset not supported\n");
> else
> DRM_DEBUG_DRIVER("GPU reset disabled\n");
> - goto error;
> +
> + /*
> + * Don't unwedge if reset is disabled or not supported
> + * because we can't guarantee what the hardware status is.
> + */
> + if (intel_gt_is_wedged(gt))
> + goto unlock;
> }
>
> + /* Clear any previous failed attempts at recovery. Time to try again. */
> + if (!__intel_gt_unset_wedged(gt))
> + goto unlock;
> +
> + atomic_inc(>->i915->gpu_error.reset_count);
> +
> + awake = reset_prepare(gt);
> +
> + if (!intel_has_gpu_reset(gt->i915))
> + goto error;
> +
> if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
> intel_runtime_pm_disable_interrupts(gt->i915);
>
>
More information about the Intel-gfx
mailing list