[PATCH 75/76] RFC drm/i915: Load balancing across a virtual engine
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jun 6 10:27:02 UTC 2018
On 06/06/2018 10:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-06 10:16:00)
>>
>> On 02/06/2018 10:38, Chris Wilson wrote:
>>> Having allowed the user to define a set of engines that they will want
>>> to only use, we go one step further and allow them to bind those engines
>>> into a single virtual instance. Submitting a batch to the virtual engine
>>> will then forward it to any one of the set in a manner as best to
>>> distribute load. The virtual engine has a single timeline across all
>>> engines (it operates as a single queue), so it is not able to concurrently
>>> run batches across multiple engines by itself; that is left up to the user
>>> to submit multiple concurrent batches to multiple queues. Multiple users
>>> will be load balanced across the system.
>>>
>>> The mechanism used for load balancing in this patch is a late greedy
>>> balancer. When a request is ready for execution, it is added to each
>>> engine's queue, and when an engine is ready for its next request it
>>> claims it from the virtual engine. The first engine to do so, wins, i.e.
>>> the request is executed at the earliest opportunity (idle moment) in the
>>> system.
>>>
>>> As not all HW is created equal, the user is still able to skip the
>>> virtual engine and execute the batch on a specific engine, all within the
>>> same queue. It will then be executed in order on the correct engine,
>>> with execution on other virtual engines being moved away due to the load
>>> detection.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> Opens:
>>> - virtual takes priority
>>> - rescheduling after being gazumped
>>> - eliminating the irq
>>> ---
>>> drivers/gpu/drm/i915/i915_gem.h | 5 +
>>> drivers/gpu/drm/i915/i915_gem_context.c | 81 ++++-
>>> drivers/gpu/drm/i915/i915_request.c | 2 +-
>>> drivers/gpu/drm/i915/intel_engine_cs.c | 3 +-
>>> drivers/gpu/drm/i915/intel_lrc.c | 393 ++++++++++++++++++++-
>>> drivers/gpu/drm/i915/intel_lrc.h | 6 +
>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 9 +
>>> drivers/gpu/drm/i915/selftests/intel_lrc.c | 177 ++++++++++
>>> include/uapi/drm/i915_drm.h | 27 ++
>>> 9 files changed, 697 insertions(+), 6 deletions(-)
>>>
>>
>> [snip]
>>
>>> +struct intel_engine_cs *
>>> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
>>> + struct intel_engine_cs **siblings,
>>> + unsigned int count)
>>> +{
>>> + struct virtual_engine *ve;
>>> + unsigned int n;
>>> + int err;
>>> +
>>> + if (!count)
>>> + return ERR_PTR(-EINVAL);
>>> +
>>> + ve = kzalloc(sizeof(*ve) + count * sizeof(*ve->siblings), GFP_KERNEL);
>>> + if (!ve)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + kref_init(&ve->kref);
>>> + ve->base.i915 = ctx->i915;
>>> + ve->base.id = -1;
>>
>> 1)
>>
>> I had the idea to add a new engine virtual class, and set instances to
>> real classes:
>>
>> ve->(uabi_)class = <CLASS_VIRTUAL>;
>> ve->instance = parent->class;
>>
>> That would work fine in tracepoints (just need to remap class to uabi
>> class for virtual engines).
>
> Though conceptually it may be bonkers, are we ever going to be able to
> mix classes? e.g. veng over bcs+vcs for very simple testcases like wsim.
I have settled for no mixing of classes. We started with that, apart for
a period where you wanted to mix, but then you also changed your mind, no?
> For simplicity to ve->uabi_class = VIRTUAL, allowing us to use ve->class
> to make our lives easier. Also we would need to get reserve the id?
engine->id can remain -1 I think.
Not sure what your were aiming at with uabi_class and class. If we
reserve a virtual class in both namespaces you see a problem?
> Just trying to strike the right balance for the restrictions.
>
>> 2)
>>
>> And I think it would also work for queued pmu I was thinking to export
>> virtual classes as vcs-* nodes, in comparison to current vcs0-busy.
>>
>> vcs-queued/runnable/running would then contain aggregated counts for all
>> virtual engines, while the vcsN-queued/.../... would contain only non
>> virtual engine counts.
>
> It's just finding them :)
>
> i915->gt.class[].virtual_list.
>
> Or just i915->gt.class[].engine_list and skip non-virtual.
Not sure if I would have to find them. I was hoping the existing "hooks"
would to the accounting for me by the virtue of something like:
if (engine->class == VIRTUAL)
atomic_inc(&i915->virtual_stats->runnable;
else
atomic_inc(&engine->request_stats->runnable);
Etc..
There may be details for sure, but I do hope it will be possible.
>
>> It is a tiny bit hackish but we still get to export GPU load so sounds
>> okay to me.
>
> Seems a reasonable argument. Restricting veng to one class, then being
> able to summaries all vengs as one super-virtual instance, sounds a
> reasonable trade-off and selling point.
Cool!
Regards,
Tvrtko
More information about the Intel-gfx-trybot
mailing list