[Intel-gfx] [PATCH] drm/i915: Split I915_RESET_IN_PROGRESS into two flags

Thu Feb 23 19:29:59 UTC 2017


On 23/02/17 08:59, Chris Wilson wrote:
> I915_RESET_IN_PROGRESS is being used for both signaling the requirement
> to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and
> to instruct a waiter (already holding the struct_mutex) to perform the
> reset. To allow for a little more coordination, split these two meaning
> into a couple of distinct flags. I915_RESET_BACKOFF tells
> i915_mutex_lock_interruptible() not to acquire the mutex and
> I915_RESET_HANDOFF tells the waiter to call i915_reset().
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> This is part of a much bigger problem to try and restore the balance
> between atomic modeset and resets. However, now that the waiter has
> been revamped, this patch should help us ease forward with TDR.
> -Chris
> ---

...

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index eed9ead1b592..7e9b1a008134 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h


reset_count will have a stale comment now, i.e.:

@@ -1558,7 +1558,7 @@ struct i915_gpu_error {
          *
          * This is a counter which gets incremented when reset is 
triggered,
          *
-        * Before the reset commences, the I915_RESET_IN_PROGRESS bit is set
+        * Before the reset commences, the I915_RESET_BACKOFF bit is set
          * meaning that any waiters holding onto the struct_mutex should
          * relinquish the lock immediately in order for the reset to start.
          *


> @@ -1578,8 +1578,33 @@ struct i915_gpu_error {
>          */
>         unsigned long reset_count;
>
> +       /**
> +        * flags: Control various stages of the GPU reset
> +        *
> +        * #I915_RESET_BACKOFF - When we start a reset, we want to stop any
> +        * other users acquiring the struct_mutex. To do this we set the
> +        * #I915_RESET_BACKOFF bit in the error flags when we detect a reset
> +        * and then check for that bit before acquiring the struct_mutex (in
> +        * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
> +        * secondary role in preventing two concurrent global reset attempts.
> +        *
> +        * #I915_RESET_HANDOFF - To perform the actual GPU reset, we need the
> +        * struct_mutex. We try to acquire the struct_mutex in the reset worker,
> +        * but it may be held by some long running waiter (that we cannot
> +        * interrupt without causing trouble). Once we are ready to do the GPU
> +        * reset, we set the I915_RESET_HANDOFF bit and wakeup any waiters. If
> +        * they already hold the struct_mutex and want to participate they can
> +        * inspect the bit and do the reset directly, otherwise the worker
> +        * waits for the struct_mutex.
> +        *
> +        * #I915_WEDGED - If reset fails and we can no longer use the GPU,
> +        * we set the #I915_WEDGED bit. Prior to command submission, e.g.
> +        * i915_gem_request_alloc(), this bit is checked and the sequence
> +        * aborted (with -EIO reported to userspace) if set.
> +        */
>         unsigned long flags;
> -#define I915_RESET_IN_PROGRESS 0
> +#define I915_RESET_BACKOFF     0
> +#define I915_RESET_HANDOFF     1
>  #define I915_WEDGED            (BITS_PER_LONG - 1)
>
>         /**

I've been looking fwd this change,


Acked-by: Michel Thierry <michel.thierry at intel.com>