[PATCH 75/76] RFC drm/i915: Load balancing across a virtual engine
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jun 6 09:16:00 UTC 2018
On 02/06/2018 10:38, Chris Wilson wrote:
> Having allowed the user to define a set of engines that they will want
> to only use, we go one step further and allow them to bind those engines
> into a single virtual instance. Submitting a batch to the virtual engine
> will then forward it to any one of the set in a manner as best to
> distribute load. The virtual engine has a single timeline across all
> engines (it operates as a single queue), so it is not able to concurrently
> run batches across multiple engines by itself; that is left up to the user
> to submit multiple concurrent batches to multiple queues. Multiple users
> will be load balanced across the system.
>
> The mechanism used for load balancing in this patch is a late greedy
> balancer. When a request is ready for execution, it is added to each
> engine's queue, and when an engine is ready for its next request it
> claims it from the virtual engine. The first engine to do so, wins, i.e.
> the request is executed at the earliest opportunity (idle moment) in the
> system.
>
> As not all HW is created equal, the user is still able to skip the
> virtual engine and execute the batch on a specific engine, all within the
> same queue. It will then be executed in order on the correct engine,
> with execution on other virtual engines being moved away due to the load
> detection.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>
> Opens:
> - virtual takes priority
> - rescheduling after being gazumped
> - eliminating the irq
> ---
> drivers/gpu/drm/i915/i915_gem.h | 5 +
> drivers/gpu/drm/i915/i915_gem_context.c | 81 ++++-
> drivers/gpu/drm/i915/i915_request.c | 2 +-
> drivers/gpu/drm/i915/intel_engine_cs.c | 3 +-
> drivers/gpu/drm/i915/intel_lrc.c | 393 ++++++++++++++++++++-
> drivers/gpu/drm/i915/intel_lrc.h | 6 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +
> drivers/gpu/drm/i915/selftests/intel_lrc.c | 177 ++++++++++
> include/uapi/drm/i915_drm.h | 27 ++
> 9 files changed, 697 insertions(+), 6 deletions(-)
>
[snip]
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> + struct intel_engine_cs **siblings,
> + unsigned int count)
> +{
> + struct virtual_engine *ve;
> + unsigned int n;
> + int err;
> +
> + if (!count)
> + return ERR_PTR(-EINVAL);
> +
> + ve = kzalloc(sizeof(*ve) + count * sizeof(*ve->siblings), GFP_KERNEL);
> + if (!ve)
> + return ERR_PTR(-ENOMEM);
> +
> + kref_init(&ve->kref);
> + ve->base.i915 = ctx->i915;
> + ve->base.id = -1;
1)
I had the idea to add a new engine virtual class, and set instances to
real classes:
ve->(uabi_)class = <CLASS_VIRTUAL>;
ve->instance = parent->class;
That would work fine in tracepoints (just need to remap class to uabi
class for virtual engines).
2)
And I think it would also work for queued pmu I was thinking to export
virtual classes as vcs-* nodes, in comparison to current vcs0-busy.
vcs-queued/runnable/running would then contain aggregated counts for all
virtual engines, while the vcsN-queued/.../... would contain only non
virtual engine counts.
It is a tiny bit hackish but we still get to export GPU load so sounds
okay to me.
Thoughts?
Regards,
Tvrtko
More information about the Intel-gfx-trybot
mailing list