[Intel-gfx] [PATCH 5/5] drm/i915/gt: Only transfer the virtual context to the new engine if active

Thu Jul 16 17:28:19 UTC 2020

Quoting Tvrtko Ursulin (2020-07-16 16:37:25)
> 
> On 16/07/2020 12:33, Chris Wilson wrote:
> > One more complication of preempt-to-busy with respect to the virtual
> > engine is that we may have retired the last request along the virtual
> > engine at the same time as preparing to submit the completed request to
> > a new engine. That submit will be shortcircuited, but not before we have
> > updated the context with the new register offsets and marked the virtual
> > engine as bound to the new engine (by calling swap on ve->siblings[]).
> > As we may have just retired the completed request, we may also be in the
> > middle of calling intel_context_exit() to turn off the power management
> 
> virtual_context_exit
> 
> > associated with the virtual engine, and that in turn walks the
> > ve->siblings[]. If we happen to call swap() on the array as we walk, we
> > will call intel_engine_pm_put() twice on the same engine.
> > 
> > In this patch, we prevent this by only updating the bound engine after a
> > successful submission which weeds out the already completed requests.
> > 
> > Alternatively, we could walk a non-volatile array for the pm, such as
> > using the engine->mask. The small advantage to performing the update
> > after the submit is that we then only have to do a swap for active
> > requests.
> > 
> > Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> > References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana at intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 65 ++++++++++++++++++-----------
> >   1 file changed, 40 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 88a5c155154d..5e8278e8ac79 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1805,6 +1805,33 @@ static bool virtual_matches(const struct virtual_engine *ve,
> >       return true;
> >   }
> >   
> > +static void virtual_xfer_context(struct virtual_engine *ve,
> > +                              struct intel_engine_cs *engine)
> > +{
> > +     unsigned int n;
> > +
> > +     if (likely(engine == ve->siblings[0]))
> > +             return;
> > +
> > +     GEM_BUG_ON(READ_ONCE(ve->context.inflight));
> > +     if (!intel_engine_has_relative_mmio(engine))
> > +             virtual_update_register_offsets(ve->context.lrc_reg_state,
> > +                                             engine);
> > +
> > +     /*
> > +      * Move the bound engine to the top of the list for
> > +      * future execution. We then kick this tasklet first
> > +      * before checking others, so that we preferentially
> > +      * reuse this set of bound registers.
> > +      */
> > +     for (n = 1; n < ve->num_siblings; n++) {
> > +             if (ve->siblings[n] == engine) {
> > +                     swap(ve->siblings[n], ve->siblings[0]);
> > +                     break;
> > +             }
> > +     }
> > +}
> > +
> >   #define for_each_waiter(p__, rq__) \
> >       list_for_each_entry_lockless(p__, \
> >                                    &(rq__)->sched.waiters_list, \
> > @@ -2253,35 +2280,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                       GEM_BUG_ON(!(rq->execution_mask & engine->mask));
> >                       WRITE_ONCE(rq->engine, engine);
> >   
> > -                     if (engine != ve->siblings[0]) {
> > -                             u32 *regs = ve->context.lrc_reg_state;
> > -                             unsigned int n;
> > -
> > -                             GEM_BUG_ON(READ_ONCE(ve->context.inflight));
> > -
> > -                             if (!intel_engine_has_relative_mmio(engine))
> > -                                     virtual_update_register_offsets(regs,
> > -                                                                     engine);
> > -
> > +                     if (__i915_request_submit(rq)) {
> >                               /*
> > -                              * Move the bound engine to the top of the list
> > -                              * for future execution. We then kick this
> > -                              * tasklet first before checking others, so that
> > -                              * we preferentially reuse this set of bound
> > -                              * registers.
> > +                              * Only after we confirm that we will submit
> > +                              * this request (i.e. it has not already
> > +                              * completed), do we want to update the context.
> > +                              *
> > +                              * This serves two purposes. It avoids
> > +                              * unnecessary work if we are resubmitting an
> > +                              * already completed request after timeslicing.
> > +                              * But more importantly, it prevents us altering
> > +                              * ve->siblings[] on an idle context, where
> > +                              * we may be using ve->siblings[] in
> > +                              * virtual_context_enter / virtual_context_exit.
> >                                */
> > -                             for (n = 1; n < ve->num_siblings; n++) {
> > -                                     if (ve->siblings[n] == engine) {
> > -                                             swap(ve->siblings[n],
> > -                                                  ve->siblings[0]);
> > -                                             break;
> > -                                     }
> > -                             }
> > -
> > +                             virtual_xfer_context(ve, engine);
> >                               GEM_BUG_ON(ve->siblings[0] != engine);
> > -                     }
> >   
> > -                     if (__i915_request_submit(rq)) {
> >                               submit = true;
> >                               last = rq;
> >                       }
> > 
> 
> This was nasty indeed. How did you manage to find this?

Staring the traces, playing games of what-if. The clue was a bunch of
retirement and parking (too early due to the bad wakeref count) around
the very occasional bouncing virtual engine.

> I think it is okay to do the update once we know submit is doing ahead, 
> but in the light of this "gotcha" I think it would also be good to start 
> using for_each_engine_masked in virtual_context_enter/exit.

I am tempted, but I want to avoid further reliance on ve->base.gt. That
may be a moot point already, but I hope not.
-Chris