[PATCH 75/76] RFC drm/i915: Load balancing across a virtual engine

Wed Jun 6 09:16:00 UTC 2018

On 02/06/2018 10:38, Chris Wilson wrote:
> Having allowed the user to define a set of engines that they will want
> to only use, we go one step further and allow them to bind those engines
> into a single virtual instance. Submitting a batch to the virtual engine
> will then forward it to any one of the set in a manner as best to
> distribute load.  The virtual engine has a single timeline across all
> engines (it operates as a single queue), so it is not able to concurrently
> run batches across multiple engines by itself; that is left up to the user
> to submit multiple concurrent batches to multiple queues. Multiple users
> will be load balanced across the system.
> 
> The mechanism used for load balancing in this patch is a late greedy
> balancer. When a request is ready for execution, it is added to each
> engine's queue, and when an engine is ready for its next request it
> claims it from the virtual engine. The first engine to do so, wins, i.e.
> the request is executed at the earliest opportunity (idle moment) in the
> system.
> 
> As not all HW is created equal, the user is still able to skip the
> virtual engine and execute the batch on a specific engine, all within the
> same queue. It will then be executed in order on the correct engine,
> with execution on other virtual engines being moved away due to the load
> detection.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> 
> Opens:
>   - virtual takes priority
>   - rescheduling after being gazumped
>   - eliminating the irq
> ---
>   drivers/gpu/drm/i915/i915_gem.h            |   5 +
>   drivers/gpu/drm/i915/i915_gem_context.c    |  81 ++++-
>   drivers/gpu/drm/i915/i915_request.c        |   2 +-
>   drivers/gpu/drm/i915/intel_engine_cs.c     |   3 +-
>   drivers/gpu/drm/i915/intel_lrc.c           | 393 ++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.h           |   6 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |   9 +
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 177 ++++++++++
>   include/uapi/drm/i915_drm.h                |  27 ++
>   9 files changed, 697 insertions(+), 6 deletions(-)
> 

[snip]

> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count)
> +{
> +	struct virtual_engine *ve;
> +	unsigned int n;
> +	int err;
> +
> +	if (!count)
> +		return ERR_PTR(-EINVAL);
> +
> +	ve = kzalloc(sizeof(*ve) + count * sizeof(*ve->siblings), GFP_KERNEL);
> +	if (!ve)
> +		return ERR_PTR(-ENOMEM);
> +
> +	kref_init(&ve->kref);
> +	ve->base.i915 = ctx->i915;
> +	ve->base.id = -1;

1)

I had the idea to add a new engine virtual class, and set instances to 
real classes:

	ve->(uabi_)class = <CLASS_VIRTUAL>;
	ve->instance = parent->class;

That would work fine in tracepoints (just need to remap class to uabi 
class for virtual engines).

2)

And I think it would also work for queued pmu I was thinking to export 
virtual classes as vcs-* nodes, in comparison to current vcs0-busy.

vcs-queued/runnable/running would then contain aggregated counts for all 
virtual engines, while the vcsN-queued/.../... would contain only non 
virtual engine counts.

It is a tiny bit hackish but we still get to export GPU load so sounds 
okay to me.

Thoughts?

Regards,

Tvrtko