[PATCH 75/76] RFC drm/i915: Load balancing across a virtual engine

Wed Jun 6 10:27:02 UTC 2018

On 06/06/2018 10:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-06 10:16:00)
>>
>> On 02/06/2018 10:38, Chris Wilson wrote:
>>> Having allowed the user to define a set of engines that they will want
>>> to only use, we go one step further and allow them to bind those engines
>>> into a single virtual instance. Submitting a batch to the virtual engine
>>> will then forward it to any one of the set in a manner as best to
>>> distribute load.  The virtual engine has a single timeline across all
>>> engines (it operates as a single queue), so it is not able to concurrently
>>> run batches across multiple engines by itself; that is left up to the user
>>> to submit multiple concurrent batches to multiple queues. Multiple users
>>> will be load balanced across the system.
>>>
>>> The mechanism used for load balancing in this patch is a late greedy
>>> balancer. When a request is ready for execution, it is added to each
>>> engine's queue, and when an engine is ready for its next request it
>>> claims it from the virtual engine. The first engine to do so, wins, i.e.
>>> the request is executed at the earliest opportunity (idle moment) in the
>>> system.
>>>
>>> As not all HW is created equal, the user is still able to skip the
>>> virtual engine and execute the batch on a specific engine, all within the
>>> same queue. It will then be executed in order on the correct engine,
>>> with execution on other virtual engines being moved away due to the load
>>> detection.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> Opens:
>>>    - virtual takes priority
>>>    - rescheduling after being gazumped
>>>    - eliminating the irq
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.h            |   5 +
>>>    drivers/gpu/drm/i915/i915_gem_context.c    |  81 ++++-
>>>    drivers/gpu/drm/i915/i915_request.c        |   2 +-
>>>    drivers/gpu/drm/i915/intel_engine_cs.c     |   3 +-
>>>    drivers/gpu/drm/i915/intel_lrc.c           | 393 ++++++++++++++++++++-
>>>    drivers/gpu/drm/i915/intel_lrc.h           |   6 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.h    |   9 +
>>>    drivers/gpu/drm/i915/selftests/intel_lrc.c | 177 ++++++++++
>>>    include/uapi/drm/i915_drm.h                |  27 ++
>>>    9 files changed, 697 insertions(+), 6 deletions(-)
>>>
>>
>> [snip]
>>
>>> +struct intel_engine_cs *
>>> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
>>> +                            struct intel_engine_cs **siblings,
>>> +                            unsigned int count)
>>> +{
>>> +     struct virtual_engine *ve;
>>> +     unsigned int n;
>>> +     int err;
>>> +
>>> +     if (!count)
>>> +             return ERR_PTR(-EINVAL);
>>> +
>>> +     ve = kzalloc(sizeof(*ve) + count * sizeof(*ve->siblings), GFP_KERNEL);
>>> +     if (!ve)
>>> +             return ERR_PTR(-ENOMEM);
>>> +
>>> +     kref_init(&ve->kref);
>>> +     ve->base.i915 = ctx->i915;
>>> +     ve->base.id = -1;
>>
>> 1)
>>
>> I had the idea to add a new engine virtual class, and set instances to
>> real classes:
>>
>>          ve->(uabi_)class = <CLASS_VIRTUAL>;
>>          ve->instance = parent->class;
>>
>> That would work fine in tracepoints (just need to remap class to uabi
>> class for virtual engines).
> 
> Though conceptually it may be bonkers, are we ever going to be able to
> mix classes? e.g. veng over bcs+vcs for very simple testcases like wsim.

I have settled for no mixing of classes. We started with that, apart for 
a period where you wanted to mix, but then you also changed your mind, no?

> For simplicity to ve->uabi_class = VIRTUAL, allowing us to use ve->class
> to make our lives easier. Also we would need to get reserve the id?

engine->id can remain -1 I think.

Not sure what your were aiming at with uabi_class and class. If we 
reserve a virtual class in both namespaces you see a problem?

> Just trying to strike the right balance for the restrictions.
>   
>> 2)
>>
>> And I think it would also work for queued pmu I was thinking to export
>> virtual classes as vcs-* nodes, in comparison to current vcs0-busy.
>>
>> vcs-queued/runnable/running would then contain aggregated counts for all
>> virtual engines, while the vcsN-queued/.../... would contain only non
>> virtual engine counts.
> 
> It's just finding them :)
> 
> i915->gt.class[].virtual_list.
> 
> Or just i915->gt.class[].engine_list and skip non-virtual.

Not sure if I would have to find them. I was hoping the existing "hooks" 
would to the accounting for me by the virtue of something like:

	if (engine->class == VIRTUAL)
		atomic_inc(&i915->virtual_stats->runnable;
	else
		atomic_inc(&engine->request_stats->runnable);

Etc..

There may be details for sure, but I do hope it will be possible.

>   
>> It is a tiny bit hackish but we still get to export GPU load so sounds
>> okay to me.
> 
> Seems a reasonable argument. Restricting veng to one class, then being
> able to summaries all vengs as one super-virtual instance, sounds a
> reasonable trade-off and selling point.

Cool!

Regards,

Tvrtko