[Intel-gfx] [PATCH 1/2] drm/i915: Don't call synchronize_rcu_expedited under struct_mutex

Fri Apr 7 13:32:30 UTC 2017

Hello Joonas,

On Fri, Apr 07, 2017 at 01:49:34PM +0300, Joonas Lahtinen wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 2978acd..129ed30 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -53,6 +53,17 @@ static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
>  	BUG();
>  }
>  
> +static void i915_gem_shrinker_unlock(struct drm_device *dev, bool unlock)
> +{
> +	if (!unlock)
> +		return;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	/* expedite the RCU grace period to free some request slabs */
> +	synchronize_rcu_expedited();
> +}
> +
>  static bool any_vma_pinned(struct drm_i915_gem_object *obj)
>  {
>  	struct i915_vma *vma;
> @@ -232,11 +243,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
>  		intel_runtime_pm_put(dev_priv);
>  
>  	i915_gem_retire_requests(dev_priv);
> -	if (unlock)
> -		mutex_unlock(&dev_priv->drm.struct_mutex);
>  
> -	/* expedite the RCU grace period to free some request slabs */
> -	synchronize_rcu_expedited();
> +	i915_gem_shrinker_unlock(&dev_priv->drm, unlock);
>  }
> @@ -296,8 +304,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
>  			count += obj->base.size >> PAGE_SHIFT;
>  	}
>  
> -	if (unlock)
> -		mutex_unlock(&dev->struct_mutex);
> +	i915_gem_shrinker_unlock(dev, unlock);

Why here? This doesn't make sense, all it matters is the throttling to
happen when scan_objects is run, if count_objects is run it's not
worth it to wait a quiescent point and to call
synchronize_rcu_expedited(), it is *very* expensive. I recommend to
reverse this hunk, I think it's worsening the runtime with zero
benefits to increase the reliability of reclaim.

>  	return count;
>  }
> @@ -324,8 +331,8 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>  					 sc->nr_to_scan - freed,
>  					 I915_SHRINK_BOUND |
>  					 I915_SHRINK_UNBOUND);
> -	if (unlock)
> -		mutex_unlock(&dev->struct_mutex);
> +
> +	i915_gem_shrinker_unlock(dev, unlock);
>  
>  	return freed;
>  }

Perfect here, faster than re-reading the mutex too, already thought of
checking unlock instead. Although if that's the only place it can as
well be explicit.

> @@ -367,8 +374,7 @@ i915_gem_shrinker_unlock_uninterruptible(struct drm_i915_private *dev_priv,
>  					 struct shrinker_lock_uninterruptible *slu)
>  {
>  	dev_priv->mm.interruptible = slu->was_interruptible;
> -	if (slu->unlock)
> -		mutex_unlock(&dev_priv->drm.struct_mutex);
> +	i915_gem_shrinker_unlock(&dev_priv->drm, slu->unlock);
>  }

i915_gem_shrinker_unlock_uninterruptible is more of a helper function
but it doesn't always make sense to wait a rcu grace period. I think
it'd be better to be selective in when to explicitly run
synchronize_rcu_expedited(); in fact in some case synchronize_sched()
may be prefereable and it will cause less wasted CPU cycles at the
only downside of higher latency, it's all about tradeoffs which one
should be called as they're equivalent as far as i915 is concerned. I
don't think calling synchronize_rcu_expedited(); unconditionally in a
unlock helper is ok and it'd be better to decided if to call (and
which one) on a case by case basis.

I kind I prefer my patch, just cleaned up with the
synchronize_rcu_expedited under if (unlock) { mutex_unlock;
synchronize_rcu... }.

I'd also like to see all those mutex_trylock_recursive dropped, the
only place where it makes sense to use it really is
i915_gem_shrinker_scan and i915_gem_shrinker_count, unless in fact we
want to replace it there too with a mutex_trylock and stop the
recursive behavior of reclaim into DRM code. The other cases where the
code uses lock recursion don't make much sense to me, I think the code
would be simpler if the information on the struct_mutex being hold
would be passed as parameter in all other cases. It'd be likely faster
as well for the same reason why checking "unlock" is better above than
checking the mutex.