[Intel-gfx] [PATCH v2] drm/i915: Drain the device workqueue on unload

Wed Jul 19 11:18:47 UTC 2017

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Workers on the i915->wq may rearm themselves so for completeness we need
> to replace our flush_workqueue() with a call to drain_workqueue() before
> unloading the device.
>
> v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> few of the tasks that need to be drained may first be armed by RCU.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c                  |  6 ++----
>  drivers/gpu/drm/i915/i915_drv.h                  | 20 ++++++++++++++++++++
>  drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 +-
>  3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4b62fd012877..41c5b11a7c8f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>  
>  static void i915_gem_fini(struct drm_i915_private *dev_priv)
>  {
> -	flush_workqueue(dev_priv->wq);
> +	/* Flush any outstanding unpin_work. */
> +	i915_gem_drain_workqueue(dev_priv);
>  
>  	mutex_lock(&dev_priv->drm.struct_mutex);
>  	intel_uc_fini_hw(dev_priv);
> @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
>  	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>  	i915_reset_error_state(dev_priv);
>  
> -	/* Flush any outstanding unpin_work. */
> -	drain_workqueue(dev_priv->wq);
> -
>  	i915_gem_fini(dev_priv);
>  	intel_uc_fini_fw(dev_priv);
>  	intel_fbc_cleanup_cfb(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 667fb5c44483..e9a4b96dc775 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>  	} while (flush_work(&i915->mm.free_work));
>  }
>  
> +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> +{
> +	/*
> +	 * Similar to objects above (see i915_gem_drain_freed-objects), in
> +	 * general we have workers that are armed by RCU and then rearm
> +	 * themselves in their callbacks. To be paranoid, we need to
> +	 * drain the workqueue a second time after waiting for the RCU
> +	 * grace period so that we catch work queued via RCU from the first
> +	 * pass. As neither drain_workqueue() nor flush_workqueue() report
> +	 * a result, we make an assumption that we only don't require more
> +	 * than 2 passes to catch all recursive RCU delayed work.
> +	 *
> +	 */
> +	int pass = 2;
> +	do {
> +		rcu_barrier();
> +		drain_workqueue(i915->wq);

I am fine with the paranoia, and it covers the case below. Still if we do:

drain_workqueue();
rcu_barrier();

With drawining in progress, only chain queuing is allowed. I understand
this so that when it returns, all the ctx pointers are now unreferenced
but not freed.

Thus the rcu_barrier() after it cleans the trash and we are good to
be unloaded. With one pass.

I guess it comes to how to understand the comment, so could you
elaborate the 'we have workers that are armed by RCU and then rearm
themselves'?. As from drain_workqueue desc, this should be covered.

Thanks,
-Mika

> +	} while (--pass);
> +}
> +
>  struct i915_vma * __must_check
>  i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
>  			 const struct i915_ggtt_view *view,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 47613d20bba8..7a468cb30946 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
>  
>  	cancel_delayed_work_sync(&i915->gt.retire_work);
>  	cancel_delayed_work_sync(&i915->gt.idle_work);
> -	flush_workqueue(i915->wq);
> +	i915_gem_drain_workqueue(i915);
>  
>  	mutex_lock(&i915->drm.struct_mutex);
>  	for_each_engine(engine, i915, id)
> -- 
> 2.13.3