[Intel-gfx] [PATCH 1/3] drm/i915: Limit the backpressure for i915_request allocation

Fri Sep 14 10:09:28 UTC 2018

On 14/09/2018 09:00, Chris Wilson wrote:
> If we try and fail to allocate a i915_request, we apply some
> backpressure on the clients to throttle the memory allocations coming
> from i915.ko. Currently, we wait until completely idle, but this is far
> too heavy and leads to some situations where the only escape is to
> declare a client hung and reset the GPU. The intent is to only ratelimit
> the allocation requests and to allow ourselves to recycle requests and
> memory from any long queues built up by a client hog.
> 
> Although the system memory is inherently a global resources, we don't
> want to overly penalize an unlucky client to pay the price of reaping a
> hog. To reduce the influence of one client on another, we can instead of
> waiting for the entire GPU to idle, impose a barrier on the local client.
> (One end goal for request allocation is for scalability to many
> concurrent allocators; simultaneous execbufs.)
> 
> To prevent ourselves from getting caught out by long running requests
> (requests that may never finish without userspace intervention, whom we
> are blocking) we need to impose a finite timeout, ideally shorter than
> hangcheck. A long time ago Paul McKenney suggested that RCU users should
> ratelimit themselves using judicious use of cond_synchronize_rcu(). This
> gives us the opportunity to reduce our indefinite wait for the GPU to
> idle to a wait for the RCU grace period of the previous allocation along
> this timeline to expire, satisfying both the local and finite properties
> we desire for our ratelimiting.
> 
> There are still a few global steps (reclaim not least amongst those!)
> when we exhaust the immediate slab pool, at least now the wait is itself
> decoupled from struct_mutex for our glorious highly parallel future!
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 14 ++++++++------
>   drivers/gpu/drm/i915/i915_request.h |  8 ++++++++
>   2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 09ed48833b54..a492385b2089 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -732,13 +732,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	rq = kmem_cache_alloc(i915->requests,
>   			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>   	if (unlikely(!rq)) {
> +		i915_retire_requests(i915);
> +
>   		/* Ratelimit ourselves to prevent oom from malicious clients */
> -		ret = i915_gem_wait_for_idle(i915,
> -					     I915_WAIT_LOCKED |
> -					     I915_WAIT_INTERRUPTIBLE,
> -					     MAX_SCHEDULE_TIMEOUT);
> -		if (ret)
> -			goto err_unreserve;
> +		rq = i915_gem_active_raw(&ce->ring->timeline->last_request,
> +					 &i915->drm.struct_mutex);
> +		if (rq)
> +			cond_synchronize_rcu(rq->rcustate);
>   
>   		/*
>   		 * We've forced the client to stall and catch up with whatever
> @@ -758,6 +758,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		}
>   	}
>   
> +	rq->rcustate = get_state_synchronize_rcu();
> +
>   	INIT_LIST_HEAD(&rq->active_list);
>   	rq->i915 = i915;
>   	rq->engine = engine;
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 9898301ab7ef..7fa94b024968 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -100,6 +100,14 @@ struct i915_request {
>   	struct i915_timeline *timeline;
>   	struct intel_signal_node signaling;
>   
> +	/*
> +	 * The rcu epoch of when this request was allocated. Used to judiciously
> +	 * apply backpressure on future allocations to ensure that under
> +	 * mempressure there is sufficient RCU ticks for us to reclaim our
> +	 * RCU protected slabs.
> +	 */
> +	unsigned long rcustate;
> +
>   	/*
>   	 * Fences for the various phases in the request's lifetime.
>   	 *
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko