[Intel-gfx] [PATCH 5/5] drm/i915/gt: Only transfer the virtual context to the new engine if active
Chris Wilson
chris at chris-wilson.co.uk
Thu Jul 16 17:28:19 UTC 2020
Quoting Tvrtko Ursulin (2020-07-16 16:37:25)
>
> On 16/07/2020 12:33, Chris Wilson wrote:
> > One more complication of preempt-to-busy with respect to the virtual
> > engine is that we may have retired the last request along the virtual
> > engine at the same time as preparing to submit the completed request to
> > a new engine. That submit will be shortcircuited, but not before we have
> > updated the context with the new register offsets and marked the virtual
> > engine as bound to the new engine (by calling swap on ve->siblings[]).
> > As we may have just retired the completed request, we may also be in the
> > middle of calling intel_context_exit() to turn off the power management
>
> virtual_context_exit
>
> > associated with the virtual engine, and that in turn walks the
> > ve->siblings[]. If we happen to call swap() on the array as we walk, we
> > will call intel_engine_pm_put() twice on the same engine.
> >
> > In this patch, we prevent this by only updating the bound engine after a
> > successful submission which weeds out the already completed requests.
> >
> > Alternatively, we could walk a non-volatile array for the pm, such as
> > using the engine->mask. The small advantage to performing the update
> > after the submit is that we then only have to do a swap for active
> > requests.
> >
> > Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> > References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 65 ++++++++++++++++++-----------
> > 1 file changed, 40 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 88a5c155154d..5e8278e8ac79 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1805,6 +1805,33 @@ static bool virtual_matches(const struct virtual_engine *ve,
> > return true;
> > }
> >
> > +static void virtual_xfer_context(struct virtual_engine *ve,
> > + struct intel_engine_cs *engine)
> > +{
> > + unsigned int n;
> > +
> > + if (likely(engine == ve->siblings[0]))
> > + return;
> > +
> > + GEM_BUG_ON(READ_ONCE(ve->context.inflight));
> > + if (!intel_engine_has_relative_mmio(engine))
> > + virtual_update_register_offsets(ve->context.lrc_reg_state,
> > + engine);
> > +
> > + /*
> > + * Move the bound engine to the top of the list for
> > + * future execution. We then kick this tasklet first
> > + * before checking others, so that we preferentially
> > + * reuse this set of bound registers.
> > + */
> > + for (n = 1; n < ve->num_siblings; n++) {
> > + if (ve->siblings[n] == engine) {
> > + swap(ve->siblings[n], ve->siblings[0]);
> > + break;
> > + }
> > + }
> > +}
> > +
> > #define for_each_waiter(p__, rq__) \
> > list_for_each_entry_lockless(p__, \
> > &(rq__)->sched.waiters_list, \
> > @@ -2253,35 +2280,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > GEM_BUG_ON(!(rq->execution_mask & engine->mask));
> > WRITE_ONCE(rq->engine, engine);
> >
> > - if (engine != ve->siblings[0]) {
> > - u32 *regs = ve->context.lrc_reg_state;
> > - unsigned int n;
> > -
> > - GEM_BUG_ON(READ_ONCE(ve->context.inflight));
> > -
> > - if (!intel_engine_has_relative_mmio(engine))
> > - virtual_update_register_offsets(regs,
> > - engine);
> > -
> > + if (__i915_request_submit(rq)) {
> > /*
> > - * Move the bound engine to the top of the list
> > - * for future execution. We then kick this
> > - * tasklet first before checking others, so that
> > - * we preferentially reuse this set of bound
> > - * registers.
> > + * Only after we confirm that we will submit
> > + * this request (i.e. it has not already
> > + * completed), do we want to update the context.
> > + *
> > + * This serves two purposes. It avoids
> > + * unnecessary work if we are resubmitting an
> > + * already completed request after timeslicing.
> > + * But more importantly, it prevents us altering
> > + * ve->siblings[] on an idle context, where
> > + * we may be using ve->siblings[] in
> > + * virtual_context_enter / virtual_context_exit.
> > */
> > - for (n = 1; n < ve->num_siblings; n++) {
> > - if (ve->siblings[n] == engine) {
> > - swap(ve->siblings[n],
> > - ve->siblings[0]);
> > - break;
> > - }
> > - }
> > -
> > + virtual_xfer_context(ve, engine);
> > GEM_BUG_ON(ve->siblings[0] != engine);
> > - }
> >
> > - if (__i915_request_submit(rq)) {
> > submit = true;
> > last = rq;
> > }
> >
>
> This was nasty indeed. How did you manage to find this?
Staring the traces, playing games of what-if. The clue was a bunch of
retirement and parking (too early due to the bad wakeref count) around
the very occasional bouncing virtual engine.
> I think it is okay to do the update once we know submit is doing ahead,
> but in the light of this "gotcha" I think it would also be good to start
> using for_each_engine_masked in virtual_context_enter/exit.
I am tempted, but I want to avoid further reliance on ve->base.gt. That
may be a moot point already, but I hope not.
-Chris
More information about the Intel-gfx
mailing list