[Intel-gfx] [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock
Chris Wilson
chris at chris-wilson.co.uk
Thu Jul 20 19:47:12 UTC 2017
Quoting Daniel Vetter (2017-07-20 18:57:48)
> ... using the biggest hammer we have. This is essentially a weaponized
> version of the timeout-based wedging Chris added in
>
> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date: Thu Jun 22 11:56:25 2017 +0100
>
> drm/i915: Break modeset deadlocks on reset
>
> Because defense-in-depth is good it's good to still have both. Also
> note that with the locking change we can now restrict this a lot (old
> gpus and special testing only), so this doesn't kill the TDR benefits
> on at least anything remotely modern.
>
> And futuremore with a few tricks it should be possible to make a much
> more educated guess about whether an atomic commit is stuck waiting on
> the gpu (atomic_t counting the pending i915_sw_fence used by the
> atomic modeset code should do it), so we can improve this.
>
> But for now just start with something that is guaranteed to recover
> faster, for much better CI througput.
>
> This defacto reverts TDR on these platforms, but there's not really a
> single commit to specify as the sole offender.
>
> v2: Add a debug message to explain what's going on. We can't DRM_ERROR
> because that spams CI. And the timeout based fallback still prints a
> DRM_ERROR, in case something goes wrong.
>
> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> ---
> drivers/gpu/drm/i915/intel_display.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 02b1f4966049..995522e40ec1 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
> !gpu_reset_clobbers_display(dev_priv))
> return;
>
> + /* We have a modeset vs reset deadlock, defensively unbreak it.
> + *
> + * FIXME: We can do a _lot_ better, this is just a first iteration.*/
> + i915_gem_set_wedged(dev_priv);
> + DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
I meant this a dev_err(). It has user visible impact and user data loss.
-Chris
More information about the Intel-gfx
mailing list