[Intel-gfx] [PATCH 4/6] drm/i915: Priority boost for new clients

Mon Aug 6 09:47:44 UTC 2018

On 06/08/2018 09:30, Chris Wilson wrote:
> Taken from an idea used for FQ_CODEL, we give new request flows a
> priority boost. These flows are likely to correspond with interactive
> tasks and so be more latency sensitive than the long queues. As soon as
> the client has more than one request in the queue, further requests are
> not boosted and it settles down into ordinary steady state behaviour.
> Such small kicks dramatically help combat the starvation issue, by
> allowing each client the opportunity to run even when the system is
> under heavy throughput load (within the constraints of the user
> selected priority).
> 
> Testcase: igt/benchmarks/rrul
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++--
>   drivers/gpu/drm/i915/i915_scheduler.h |  4 +++-
>   2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 67fd2ec75d78..63785a5c0418 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1126,8 +1126,20 @@ void i915_request_add(struct i915_request *request)
>   	 */
>   	local_bh_disable();
>   	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> -	if (engine->schedule)
> -		engine->schedule(request, &request->gem_context->sched);
> +	if (engine->schedule) {
> +		struct i915_sched_attr attr = request->gem_context->sched;
> +
> +		/*
> +		 * Boost priorities to new clients (new request flows).
> +		 *
> +		 * Allow interactive/synchronous clients to jump ahead of
> +		 * the bulk clients. (FQ_CODEL)
> +		 */
> +		if (!prev || i915_request_completed(prev))
> +			attr.priority |= I915_PRIORITY_NEWCLIENT;
> +
> +		engine->schedule(request, &attr);

This AFAIR ended up in an unresolved thread approximately one month ago.

My concern was two unprivileged clients - one (A) sending 1ms batches 
back to back, gem_wait after each, and another (B) sending say 10ms 
batches. In this scenario A would always get a priority bump and would 
keep pre-empting B whose 10ms batch would now effectively never 
complete. So a serious unfairness/starvation issue.

Or a cross-engine "exploit", where a client submits heavy render 
workload and is then able to jump the other render clients by submitting 
a no-op batch to an idle engine, if it sets up null/fake dependencies 
with it. Effectively defeating the need to be privileged to elevate 
priority.

Sorry I can only think of the problems at the moment!

Regards,

Tvrtko

> +	}
>   	rcu_read_unlock();
>   	i915_sw_fence_commit(&request->submit);
>   	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 7edfad0abfd7..e9fb6c1d5e42 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -19,12 +19,14 @@ enum {
>   	I915_PRIORITY_INVALID = INT_MIN
>   };
>   
> -#define I915_USER_PRIORITY_SHIFT 0
> +#define I915_USER_PRIORITY_SHIFT 1
>   #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
>   
>   #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
>   #define I915_PRIORITY_MASK (-I915_PRIORITY_COUNT)
>   
> +#define I915_PRIORITY_NEWCLIENT BIT(0)
> +
>   struct i915_sched_attr {
>   	/**
>   	 * @priority: execution and service priority
>