[Intel-gfx] [PATCH v7 02/20] drm/i915: Modify error handler for per engine hang recovery
Chris Wilson
chris at chris-wilson.co.uk
Fri May 12 21:09:23 UTC 2017
On Fri, May 12, 2017 at 01:55:11PM -0700, Michel Thierry wrote:
> On 5/8/2017 11:31 AM, Michel Thierry wrote:
> >On 4/29/2017 7:19 AM, Chris Wilson wrote:
> >>On Thu, Apr 27, 2017 at 04:12:42PM -0700, Michel Thierry wrote:
> >>>From: Arun Siluvery <arun.siluvery at linux.intel.com>
> >>>
> ...
> >>>+ }
> >>>+
> >>> intel_prepare_reset(dev_priv);
> >>>
> >>> set_bit(I915_RESET_HANDOFF, &dev_priv->gpu_error.flags);
> >>>@@ -2680,6 +2706,7 @@ static void i915_reset_and_wakeup(struct
> >>>drm_i915_private *dev_priv)
> >>> kobject_uevent_env(kobj,
> >>> KOBJ_CHANGE, reset_done_event);
> >>>
> >>>+finish:
> >>> /*
> >>> * Note: The wake_up also serves as a memory barrier so that
> >>> * waiters see the updated value of the dev_priv->gpu_error.
> >>>@@ -2781,7 +2808,7 @@ void i915_handle_error(struct drm_i915_private
> >>>*dev_priv,
> >>> &dev_priv->gpu_error.flags))
> >>> goto out;
> >>>
> >>>- i915_reset_and_wakeup(dev_priv);
> >>>+ i915_reset_and_wakeup(dev_priv, engine_mask);
> >>
> >>? You don't need to wakeup the struct_mutex so we don't need this after
> >>per-engine resets. Time to split up i915_reset_and_wakeup(), because we
> >>certainly shouldn't be calling intel_finish_reset() without first calling
> >>intel_prepare_reset(). Which is right here in my tree...
> >>
> >
> >Looking at your tree, it wouldn't call finish_reset there either, only
> >these two are called after a successful reset:
> >
> >finish:
> > clear_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags);
> > wake_up_all(&dev_priv->gpu_error.reset_queue);
> >
> >But you're right, we only need to clear the error flag, no need to call
> >wake_up_all.
> >
> >Should I move the per-engine reset to i915_handle_error, and then leave
> >i915_reset_and_wakeup just for full resets?
> >That would also make the promotion from per-engine to global look a bit
> >'clearer'.
> >
>
> I just noticed an issue if I don't call wake_up_all. There can be
> someone else waiting for the reset to complete
> (i915_mutex_lock_interruptible -> i915_gem_wait_for_error).
>
> I915_RESET_BACKOFF has/had 2 roles, stop any other user to grab the
> struct mutex (which we won't need in reset-engine) and prevent two
> concurrent reset attempts (which we still want). Time to add a new
> flag for the later? (I915_RESET_ENGINE_IN_PROGRESS?)
Yes, that would be a good idea to avoid dual purposing the bits. Now
that we do direct resets along the wait path, we can completely drop the
i915_mutex_interruptible(). (No one else should be holding the mutex
indefinitely.) I think that's a better approach -- I think we've already
moved all the EIO magic aware to the ABI points where we deemed it
necessary.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list