[Intel-gfx] [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged

Fri Oct 6 11:10:39 UTC 2017

On Fri, Oct 06, 2017 at 11:06:37AM +0200, Daniel Vetter wrote:

> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.

synchronize_*() provides an smp_mb() on the calling CPU and ensures an
smp_mb() on all other CPUs before completion, such that everybody agrees
on the state prior to calling syncrhonize_rcu(). So yes, no additional
ordering requirements.

>  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>  	intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
>  	/* Mark all executing requests as skipped */
>  	engine->cancel_requests(engine);
>  
> @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>  				       intel_engine_last_submit(engine));
>  }
>  
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>  {
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
>  
>  	for_each_engine(engine, i915, id)
> +		engine->submit_request = nop_submit_request;
>  
> +	/* Make sure no one is running the old callback before we proceed with
> +	 * cancelling requests and resetting the completion tracking. Otherwise
> +	 * we might submit a request to the hardware which never completes.
> +	 */

ARGH @ horrid comment style..

  http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html

:-)

> +	synchronize_rcu();
>  
> +	for_each_engine(engine, i915, id)
> +		engine_complete_requests(engine);
>  
> +	set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +	wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
>  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  	switch (state) {
>  	case FENCE_COMPLETE:
>  		trace_i915_gem_request_submit(request);
> +		rcu_read_lock();
>  		request->engine->submit_request(request);
> +		rcu_read_unlock();
>  		break;
>  
>  	case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>  	}
>  	i915_gem_request_get(vip);
>  	i915_add_request(vip);
> +	rcu_read_lock();
>  	request->engine->submit_request(request);
> +	rcu_read_unlock();
>  
>  	mutex_unlock(&i915->drm.struct_mutex);

Yes, this is a correct and good replacement, however, you said:

> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

This means that you could simply do synchronize_sched() without the
addition of rcu_read_lock()s and still be fine.