[PATCH 75/76] RFC drm/i915: Load balancing across a virtual engine

Wed Jun 6 09:28:59 UTC 2018

Quoting Tvrtko Ursulin (2018-06-06 10:16:00)
> 
> On 02/06/2018 10:38, Chris Wilson wrote:
> > Having allowed the user to define a set of engines that they will want
> > to only use, we go one step further and allow them to bind those engines
> > into a single virtual instance. Submitting a batch to the virtual engine
> > will then forward it to any one of the set in a manner as best to
> > distribute load.  The virtual engine has a single timeline across all
> > engines (it operates as a single queue), so it is not able to concurrently
> > run batches across multiple engines by itself; that is left up to the user
> > to submit multiple concurrent batches to multiple queues. Multiple users
> > will be load balanced across the system.
> > 
> > The mechanism used for load balancing in this patch is a late greedy
> > balancer. When a request is ready for execution, it is added to each
> > engine's queue, and when an engine is ready for its next request it
> > claims it from the virtual engine. The first engine to do so, wins, i.e.
> > the request is executed at the earliest opportunity (idle moment) in the
> > system.
> > 
> > As not all HW is created equal, the user is still able to skip the
> > virtual engine and execute the batch on a specific engine, all within the
> > same queue. It will then be executed in order on the correct engine,
> > with execution on other virtual engines being moved away due to the load
> > detection.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > Opens:
> >   - virtual takes priority
> >   - rescheduling after being gazumped
> >   - eliminating the irq
> > ---
> >   drivers/gpu/drm/i915/i915_gem.h            |   5 +
> >   drivers/gpu/drm/i915/i915_gem_context.c    |  81 ++++-
> >   drivers/gpu/drm/i915/i915_request.c        |   2 +-
> >   drivers/gpu/drm/i915/intel_engine_cs.c     |   3 +-
> >   drivers/gpu/drm/i915/intel_lrc.c           | 393 ++++++++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.h           |   6 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.h    |   9 +
> >   drivers/gpu/drm/i915/selftests/intel_lrc.c | 177 ++++++++++
> >   include/uapi/drm/i915_drm.h                |  27 ++
> >   9 files changed, 697 insertions(+), 6 deletions(-)
> > 
> 
> [snip]
> 
> > +struct intel_engine_cs *
> > +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> > +                            struct intel_engine_cs **siblings,
> > +                            unsigned int count)
> > +{
> > +     struct virtual_engine *ve;
> > +     unsigned int n;
> > +     int err;
> > +
> > +     if (!count)
> > +             return ERR_PTR(-EINVAL);
> > +
> > +     ve = kzalloc(sizeof(*ve) + count * sizeof(*ve->siblings), GFP_KERNEL);
> > +     if (!ve)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     kref_init(&ve->kref);
> > +     ve->base.i915 = ctx->i915;
> > +     ve->base.id = -1;
> 
> 1)
> 
> I had the idea to add a new engine virtual class, and set instances to 
> real classes:
> 
>         ve->(uabi_)class = <CLASS_VIRTUAL>;
>         ve->instance = parent->class;
> 
> That would work fine in tracepoints (just need to remap class to uabi 
> class for virtual engines).

Though conceptually it may be bonkers, are we ever going to be able to
mix classes? e.g. veng over bcs+vcs for very simple testcases like wsim.

For simplicity to ve->uabi_class = VIRTUAL, allowing us to use ve->class
to make our lives easier. Also we would need to get reserve the id?

Just trying to strike the right balance for the restrictions.

> 2)
> 
> And I think it would also work for queued pmu I was thinking to export 
> virtual classes as vcs-* nodes, in comparison to current vcs0-busy.
> 
> vcs-queued/runnable/running would then contain aggregated counts for all 
> virtual engines, while the vcsN-queued/.../... would contain only non 
> virtual engine counts.

It's just finding them :)

i915->gt.class[].virtual_list.

Or just i915->gt.class[].engine_list and skip non-virtual.

> It is a tiny bit hackish but we still get to export GPU load so sounds 
> okay to me.

Seems a reasonable argument. Restricting veng to one class, then being
able to summaries all vengs as one super-virtual instance, sounds a
reasonable trade-off and selling point.
-Chris