[Intel-gfx] [PATCH 29/32] drm/i915: Apply an execution_mask to the virtual_engine
Chris Wilson
chris at chris-wilson.co.uk
Wed Apr 17 12:46:20 UTC 2019
Quoting Tvrtko Ursulin (2019-04-17 13:35:29)
>
> On 17/04/2019 12:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-04-17 12:43:49)
> >>
> >> On 17/04/2019 08:56, Chris Wilson wrote:
> >>> Allow the user to direct which physical engines of the virtual engine
> >>> they wish to execute one, as sometimes it is necessary to override the
> >>> load balancing algorithm.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> ---
> >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 58 +++++++++++
> >>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 131 +++++++++++++++++++++++++
> >>> drivers/gpu/drm/i915/i915_request.c | 1 +
> >>> drivers/gpu/drm/i915/i915_request.h | 3 +
> >>> 4 files changed, 193 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index d6efd6aa67cb..560a18bb4cbb 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -552,6 +552,18 @@ execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
> >>> intel_engine_context_out(rq->engine);
> >>> execlists_context_status_change(rq, status);
> >>> trace_i915_request_out(rq);
> >>> +
> >>> + /*
> >>> + * If this is part of a virtual engine, its next request may have
> >>> + * been blocked waiting for access to the active context. We have
> >>> + * to kick all the siblings again in case we need to switch (e.g.
> >>> + * the next request is not runnable on this engine). Hopefully,
> >>> + * we will already have submitted the next request before the
> >>> + * tasklet runs and do not need to rebuild each virtual tree
> >>> + * and kick everyone again.
> >>> + */
> >>> + if (rq->engine != rq->hw_context->engine)
> >>> + tasklet_schedule(&rq->hw_context->engine->execlists.tasklet);
> >>
> >> Is this needed only for non-default execution_mask? If so it would be
> >> good to limit it to avoid tasklet storm with plain veng.
> >
> > The issue is not just with this rq but the next one. If that has a
> > restricted mask that prevents it running on this engine, we may have
> > missed the opportunity to queue it (and so never run it under just the
> > right circumstances).
> >
> > Something like
> > to_virtual_engine(rq->hw_context->engine)->request->execution_mask & ~rq->execution_mask
> >
> > The storm isn't quite so bad, it's only on context-out, and we often do
> > succeed in keeping it busy. I was just trying to avoid pulling in ve here.
>
> What do you mean by the "pulling in ve" bit? Avoiding using
> to_virtual_engine like in the line you wrote above?
Just laziness hiding behind an excuse of trying to not to smear veng too
widely.
> >>> +
> >>> + rq = READ_ONCE(ve->request);
> >>> + if (!rq)
> >>> + return 0;
> >>> +
> >>> + /* The rq is ready for submission; rq->execution_mask is now stable. */
> >>> + mask = rq->execution_mask;
> >>> + if (unlikely(!mask)) {
> >>> + /* Invalid selection, submit to a random engine in error */
> >>> + i915_request_skip(rq, -ENODEV);
> >>
> >> When can this happen? It looks like if it can happen we should reject it
> >> earlier. Or if it can't then just assert.
> >
> > Many submit fences can end up with an interesection of 0. This is the
> > convenient point to do the rejection, as with any other asynchronous
> > error.
>
> Which ones are many? Why would we have uAPI which allows setting
> impossible things where all requests will fail with -ENODEV?
But we are rejecting them in the uAPI, right here. This is the earliest
point where all the information for a particular execbuf is available
and we have the means of reporting that back.
> >>> + mask = ve->siblings[0]->mask;
> >>> + }
> >>> +
> >>> + GEM_TRACE("%s: rq=%llx:%lld, mask=%x, prio=%d\n",
> >>> + ve->base.name,
> >>> + rq->fence.context, rq->fence.seqno,
> >>> + mask, ve->base.execlists.queue_priority_hint);
> >>> +
> >>> + return mask;
> >>> +}
> >>> +
> >>> static void virtual_submission_tasklet(unsigned long data)
> >>> {
> >>> struct virtual_engine * const ve = (struct virtual_engine *)data;
> >>> const int prio = ve->base.execlists.queue_priority_hint;
> >>> + intel_engine_mask_t mask;
> >>> unsigned int n;
> >>>
> >>> + rcu_read_lock();
> >>> + mask = virtual_submission_mask(ve);
> >>> + rcu_read_unlock();
> >>
> >> What is the RCU for?
> >
> > Accessing ve->request. There's nothing stopping another engine from
> > spotting the ve->request still in its tree, submitting it and it being
> > retired all during the read here.
>
> AFAIU there can only be one instance of virtual_submission_tasklet per
> VE at a time and the code above is before the request is inserted into
> physical engine trees. So I don't get it.
But the veng is being utilized by real engines concurrently, they are
who take the ve->request and execute it and so may free the ve->request
behind the submission tasklet's back. Later on the spinlock comes into
play after we have decided there's a request ready.
> Hm.. but going back to the veng patch, there is a
> GEM_BUG_ON(ve->request) in virtual_submit_request. Why couldn't this be
> called multiple times in parallel for different requests?
Because we strictly ordered submission into the veng so that it only
considers one ready request at a time. Processing more requests
decreased global throughput as load-balancing is no longer "late" (the
physical engines then amalgamate all the ve requests into one submit).
-Chris
More information about the Intel-gfx
mailing list