[Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

Wed Jan 9 11:56:15 UTC 2019

On 07/01/2019 15:29, Chris Wilson wrote:
> In the continual quest to reduce the amount of global work required when
> submitting requests, replace i915_retire_requests() after allocation
> failure to retiring just our ring.
> 
> References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 33 +++++++++++++++++++++--------
>   1 file changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 1e158eb8cb97..9ba218c6029b 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>   	return NOTIFY_DONE;
>   }
>   
> +static noinline struct i915_request *
> +i915_request_alloc_slow(struct intel_context *ce)
> +{
> +	struct intel_ring *ring = ce->ring;
> +	struct i915_request *rq, *next;
> +
> +	list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
> +		/* Ratelimit ourselves to prevent oom from malicious clients */
> +		if (&next->ring_link == &ring->request_list) {

list_is_last(next, &ring->request_list) ?

> +			cond_synchronize_rcu(rq->rcustate);
> +			break; /* keep the last objects for the next request */
> +		}
> +
> +		if (!i915_request_completed(rq))
> +			break;
> +
> +		/* Retire our old requests in the hope that we free some */
> +		i915_request_retire(rq);
The RCU wait against the last submitted rq is also gone. Now it only 
sync against the next to last rq, unless there is more than two live 
requests. Is this what you intended?

If the ring timeline has is a list of r-r-r-R-R-R (r=completed, 
R=pending) then it looks like it will not sync on anything.

And if the list is r-r-r-r it will sync against a completed rq. Which I 
hope is a no-op, but still, the loop logic looks potentially dodgy.

It also has a higher level vulnerability to one hog timeline starving 
the rest I think.

Regards,

Tvrtko

> +	}
> +
> +	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
> +}
> +
>   /**
>    * i915_request_alloc - allocate a request structure
>    *
> @@ -559,15 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	rq = kmem_cache_alloc(i915->requests,
>   			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>   	if (unlikely(!rq)) {
> -		i915_retire_requests(i915);
> -
> -		/* Ratelimit ourselves to prevent oom from malicious clients */
> -		rq = i915_gem_active_raw(&ce->ring->timeline->last_request,
> -					 &i915->drm.struct_mutex);
> -		if (rq)
> -			cond_synchronize_rcu(rq->rcustate);
> -
> -		rq = kmem_cache_alloc(i915->requests, GFP_KERNEL);
> +		rq = i915_request_alloc_slow(ce);
>   		if (!rq) {
>   			ret = -ENOMEM;
>   			goto err_unreserve;
>