[Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Jul 15 12:06:41 UTC 2020


On 15/07/2020 11:50, Chris Wilson wrote:
> Currently, we use i915_request_completed() directly in
> i915_request_wait() and follow up with a manual invocation of
> dma_fence_signal(). This appears to cause a large number of contentions
> on i915_request.lock as when the process is woken up after the fence is
> signaled by an interrupt, we will then try and call dma_fence_signal()
> ourselves while the signaler is still holding the lock.
> dma_fence_is_signaled() has the benefit of checking the
> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
> avoids most of that contention.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0b2fe55e6194..bb4eb1a8780e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>   	return this_cpu != cpu;
>   }
>   
> -static bool __i915_spin_request(const struct i915_request * const rq, int state)
> +static bool __i915_spin_request(struct i915_request * const rq, int state)
>   {
>   	unsigned long timeout_ns;
>   	unsigned int cpu;
> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state)
>   	timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>   	timeout_ns += local_clock_ns(&cpu);
>   	do {
> -		if (i915_request_completed(rq))
> +		if (dma_fence_is_signaled(&rq->fence))
>   			return true;
>   
>   		if (signal_pending_state(state, current))
> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>   	 * duration, which we currently lack.
>   	 */
>   	if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
> -	    __i915_spin_request(rq, state)) {
> -		dma_fence_signal(&rq->fence);
> +	    __i915_spin_request(rq, state))
>   		goto out;
> -	}
>   
>   	/*
>   	 * This client is about to stall waiting for the GPU. In many cases
> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>   	for (;;) {
>   		set_current_state(state);
>   
> -		if (i915_request_completed(rq)) {
> -			dma_fence_signal(&rq->fence);
> +		if (dma_fence_is_signaled(&rq->fence))
>   			break;
> -		}
>   
>   		intel_engine_flush_submission(rq->engine);
>   
> 

In other words putting some latency back into the waiters, which is 
probably okay, since sync waits is not our primary model.

I have a slight concern about the remaining value of busy spinning if 
i915_request_completed check is removed from there as well. Of course it 
doesn't make sense to have different completion criteria between the 
two.. We could wait a bit longer if real check in busyspin said request 
is actually completed, just not signal it but wait for the breadcrumbs 
to do it.

Regards,

Tvrtko


More information about the Intel-gfx mailing list