[PATCH 5/9] drm/i915: More surgically unbreak the modeset vs reset deadlock
Chris Wilson
chris at chris-wilson.co.uk
Wed Jul 19 13:42:21 UTC 2017
Quoting Daniel Vetter (2017-07-19 13:54:58)
> There's no reason to entirely wedge the gpu, for the minimal deadlock
> bugfix we only need to unbreak/decouple the atomic commit from the gpu
> reset. The simplest wait to fix that is by replacing the
> unconditional fence wait a the top of commit_tail by a wait which
> completes either when the fences are done (normal case, or when a
> reset doesn't need to touch the display state). Or when the gpu reset
> needs to force-unblock all pending modeset states.
>
> Note that in both cases TDR itself keeps working, so from a userspace
> pov this trickery isn't observable. Users themselvs might spot a short
> glitch while the rendering is catching up again, but that's still
> better than pre-TDR where we've thrown away all the rendering,
> including innocent batches. Also, this fixes the regression TDR
> introduced of making gpu resets deadlock-prone when we do need to
> touch the display.
>
> One thing I noticed is that gpu_error.flags seems to use both our own
> wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
> facilities. Not entirely sure why this inconsistency exists, I just
> picked one style.
>
> A possible future avenue could be to insert the gpu reset in-between
> ongoing modeset changes, which would avoid the momentary glitch. But
> that's a lot more work to implement in the atomic commit machinery,
> and given that we only need this for pre-g4x hw, of questionable
> utility just for the sake of polishing gpu reset even more on those
> old boxes. It might be useful for other features though.
>
> v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.
>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 1 +
> drivers/gpu/drm/i915/intel_display.c | 35 ++++++++++++++++++++++++++++++-----
> 2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 07e98b07c5bc..369968539b40 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1564,6 +1564,7 @@ struct i915_gpu_error {
> unsigned long flags;
> #define I915_RESET_BACKOFF 0
> #define I915_RESET_HANDOFF 1
> +#define I915_RESET_MODESET 2
> #define I915_WEDGED (BITS_PER_LONG - 1)
> #define I915_RESET_ENGINE (I915_WEDGED - I915_NUM_ENGINES)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 5aa7ca1ab592..4762f158032d 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3471,10 +3471,9 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
> !gpu_reset_clobbers_display(dev_priv))
> return;
>
> - /* We have a modeset vs reset deadlock, defensively unbreak it.
> - *
> - * FIXME: We can do a _lot_ better, this is just a first iteration.*/
> - i915_gem_set_wedged(dev_priv);
> + /* We have a modeset vs reset deadlock, defensively unbreak it. */
> + set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
> + wake_up_all(&dev_priv->gpu_error.wait_queue);
How are we breaking the
modeset_lock -> struct_mutex -> wait_on_reset ?
We wait the modeset_lock next which stops the reset from
proceeding, and so the deadlock persists until the wedge-me timeout?
-Chris
More information about the dri-devel
mailing list