[Intel-gfx] [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

Mon Mar 5 13:59:43 UTC 2018

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Similar to the staging around handling of engine->submit_request, we
> need to stop adding to the execlists->queue prior to calling
> engine->cancel_requests. cancel_requests will move requests from the
> queue onto the timeline, so if we add a request onto the queue after that
> point, it will be lost.
>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
>  drivers/gpu/drm/i915/i915_request.c |  2 ++
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a5bd07338b46..8d913d833ab9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
>  
>  	rq = to_request(fence);
>  	engine = rq->engine;
> -	if (!engine->schedule)
> -		return;
>  
> -	engine->schedule(rq, prio);
> +	rcu_read_lock();
> +	if (engine->schedule)
> +		engine->schedule(rq, prio);
> +	rcu_read_unlock();
>  }
>  
>  static void fence_set_priority(struct dma_fence *fence, int prio)
> @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  	 */
>  	for_each_engine(engine, i915, id) {
>  		i915_gem_reset_prepare_engine(engine);
> +
>  		engine->submit_request = nop_submit_request;
> +		engine->schedule = NULL;

Why we are not using rcu_assign_pointer and rcu_deference pair
in the upper part where we check the schedule?

Further, is there are risk that we lose sync between the two
assigments. In another words, should we combine both callbacks
behind a single deferensable pointer in the engine struct?

-Mika

>  	}
> +	i915->caps.scheduler = 0;
>  
>  	/*
>  	 * Make sure no one is running the old callback before we proceed with
> @@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  		 * start to complete all requests.
>  		 */
>  		engine->submit_request = nop_complete_submit_request;
> -		engine->schedule = NULL;
>  	}
>  
> -	i915->caps.scheduler = 0;
> -
>  	/*
>  	 * Make sure no request can slip through without getting completed by
>  	 * either this call here to intel_engine_init_global_seqno, or the one
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2265bb8ff4fa..59a87afd83b6 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>  	 * decide whether to preempt the entire chain so that it is ready to
>  	 * run at the earliest possible convenience.
>  	 */
> +	rcu_read_lock();
>  	if (engine->schedule)
>  		engine->schedule(request, request->ctx->priority);
> +	rcu_read_unlock();
>  
>  	local_bh_disable();
>  	i915_sw_fence_commit(&request->submit);
> -- 
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx