[Intel-gfx] [PATCH 29/32] drm/i915: Apply an execution_mask to the virtual_engine

Wed Apr 17 12:46:20 UTC 2019

Quoting Tvrtko Ursulin (2019-04-17 13:35:29)
> 
> On 17/04/2019 12:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-04-17 12:43:49)
> >>
> >> On 17/04/2019 08:56, Chris Wilson wrote:
> >>> Allow the user to direct which physical engines of the virtual engine
> >>> they wish to execute one, as sometimes it is necessary to override the
> >>> load balancing algorithm.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c    |  58 +++++++++++
> >>>    drivers/gpu/drm/i915/gt/selftest_lrc.c | 131 +++++++++++++++++++++++++
> >>>    drivers/gpu/drm/i915/i915_request.c    |   1 +
> >>>    drivers/gpu/drm/i915/i915_request.h    |   3 +
> >>>    4 files changed, 193 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index d6efd6aa67cb..560a18bb4cbb 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -552,6 +552,18 @@ execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
> >>>        intel_engine_context_out(rq->engine);
> >>>        execlists_context_status_change(rq, status);
> >>>        trace_i915_request_out(rq);
> >>> +
> >>> +     /*
> >>> +      * If this is part of a virtual engine, its next request may have
> >>> +      * been blocked waiting for access to the active context. We have
> >>> +      * to kick all the siblings again in case we need to switch (e.g.
> >>> +      * the next request is not runnable on this engine). Hopefully,
> >>> +      * we will already have submitted the next request before the
> >>> +      * tasklet runs and do not need to rebuild each virtual tree
> >>> +      * and kick everyone again.
> >>> +      */
> >>> +     if (rq->engine != rq->hw_context->engine)
> >>> +             tasklet_schedule(&rq->hw_context->engine->execlists.tasklet);
> >>
> >> Is this needed only for non-default execution_mask? If so it would be
> >> good to limit it to avoid tasklet storm with plain veng.
> > 
> > The issue is not just with this rq but the next one. If that has a
> > restricted mask that prevents it running on this engine, we may have
> > missed the opportunity to queue it (and so never run it under just the
> > right circumstances).
> > 
> > Something like
> >       to_virtual_engine(rq->hw_context->engine)->request->execution_mask & ~rq->execution_mask
> > 
> > The storm isn't quite so bad, it's only on context-out, and we often do
> > succeed in keeping it busy. I was just trying to avoid pulling in ve here.
> 
> What do you mean by the "pulling in ve" bit? Avoiding using 
> to_virtual_engine like in the line you wrote above?

Just laziness hiding behind an excuse of trying to not to smear veng too
widely.

> >>> +
> >>> +     rq = READ_ONCE(ve->request);
> >>> +     if (!rq)
> >>> +             return 0;
> >>> +
> >>> +     /* The rq is ready for submission; rq->execution_mask is now stable. */
> >>> +     mask = rq->execution_mask;
> >>> +     if (unlikely(!mask)) {
> >>> +             /* Invalid selection, submit to a random engine in error */
> >>> +             i915_request_skip(rq, -ENODEV);
> >>
> >> When can this happen? It looks like if it can happen we should reject it
> >> earlier. Or if it can't then just assert.
> > 
> > Many submit fences can end up with an interesection of 0. This is the
> > convenient point to do the rejection, as with any other asynchronous
> > error.
> 
> Which ones are many? Why would we have uAPI which allows setting 
> impossible things where all requests will fail with -ENODEV?

But we are rejecting them in the uAPI, right here. This is the earliest
point where all the information for a particular execbuf is available
and we have the means of reporting that back.

> >>> +             mask = ve->siblings[0]->mask;
> >>> +     }
> >>> +
> >>> +     GEM_TRACE("%s: rq=%llx:%lld, mask=%x, prio=%d\n",
> >>> +               ve->base.name,
> >>> +               rq->fence.context, rq->fence.seqno,
> >>> +               mask, ve->base.execlists.queue_priority_hint);
> >>> +
> >>> +     return mask;
> >>> +}
> >>> +
> >>>    static void virtual_submission_tasklet(unsigned long data)
> >>>    {
> >>>        struct virtual_engine * const ve = (struct virtual_engine *)data;
> >>>        const int prio = ve->base.execlists.queue_priority_hint;
> >>> +     intel_engine_mask_t mask;
> >>>        unsigned int n;
> >>>    
> >>> +     rcu_read_lock();
> >>> +     mask = virtual_submission_mask(ve);
> >>> +     rcu_read_unlock();
> >>
> >> What is the RCU for?
> > 
> > Accessing ve->request. There's nothing stopping another engine from
> > spotting the ve->request still in its tree, submitting it and it being
> > retired all during the read here.
> 
> AFAIU there can only be one instance of virtual_submission_tasklet per 
> VE at a time and the code above is before the request is inserted into 
> physical engine trees. So I don't get it.

But the veng is being utilized by real engines concurrently, they are
who take the ve->request and execute it and so may free the ve->request
behind the submission tasklet's back. Later on the spinlock comes into
play after we have decided there's a request ready.

> Hm.. but going back to the veng patch, there is a 
> GEM_BUG_ON(ve->request) in virtual_submit_request. Why couldn't this be 
> called multiple times in parallel for different requests?

Because we strictly ordered submission into the veng so that it only
considers one ready request at a time. Processing more requests
decreased global throughput as load-balancing is no longer "late" (the
physical engines then amalgamate all the ve requests into one submit).
-Chris