[Intel-gfx] [PATCH 3/3] drm/i915: Stop the machine as we install the wedged submit_request handler
Chris Wilson
chris at chris-wilson.co.uk
Fri Nov 18 14:38:06 UTC 2016
On Fri, Nov 18, 2016 at 09:37:08AM +0000, Chris Wilson wrote:
> In order to prevent a race between the old callback submitting an
> incomplete request and i915_gem_set_wedged() installing its nop handler,
> we must ensure that the swap occurs when the machine is idle
> (stop_machine).
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 25 ++++++++++++++++++++-----
> 1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7037a8b26903..6b1df3de90f0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -38,6 +38,7 @@
> #include <linux/reservation.h>
> #include <linux/shmem_fs.h>
> #include <linux/slab.h>
> +#include <linux/stop_machine.h>
> #include <linux/swap.h>
> #include <linux/pci.h>
> #include <linux/dma-buf.h>
> @@ -2768,6 +2769,12 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>
> static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
> {
> + /* We need to be sure that no thread is running the old callback as
> + * we install the nop handler (otherwise we would submit a request
> + * to hardware that will never complete). In order to prevent this
> + * race, we wait until the machine is idle before making the swap
> + * (using stop_machine()).
> + */
> engine->submit_request = nop_submit_request;
>
> /* Mark all pending requests as complete so that any concurrent
> @@ -2798,19 +2805,27 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
> }
> }
>
> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> +static int __i915_gem_set_wedged_BKL(void *data)
> {
> + struct drm_i915_private *i915 = data;
> struct intel_engine_cs *engine;
> enum intel_engine_id id;
>
> + i915_gem_context_lost(i915);
> + for_each_engine(engine, i915, id)
> + i915_gem_cleanup_engine(engine);
> +
> + return 0;
> +}
> +
> +void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> +{
> lockdep_assert_held(&dev_priv->drm.struct_mutex);
> set_bit(I915_WEDGED, &dev_priv->gpu_error.flags);
>
> - i915_gem_context_lost(dev_priv);
> - for_each_engine(engine, dev_priv, id)
> - i915_gem_cleanup_engine(engine);
> - mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
> + stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
>
> + mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
> i915_gem_retire_requests(dev_priv);
mod_delayed_work() should be after retire_requests as retire_requests
should hopefully also do mod_delayed_work (we could prefix if with
if (!active_requests) mod_delayed_work())
Also considering pull context lost is proably best after the
stop_machine, before the retire_requests. There is no reason to do that
inside the stop_machine (it's just the takeover of the fence callback
that is racy with fence signaling).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list