[Intel-gfx] [PATCH] drm/i915/ringbuffer: Brute force context restore

Mon Jun 11 13:48:08 UTC 2018

Quoting Tvrtko Ursulin (2018-06-11 14:30:44)
> 
> On 11/06/2018 11:48, Chris Wilson wrote:
> > An issue encountered with switching mm on gen7 is that the GPU likes to
> > hang (with the VS unit busy) when told to force restore the current
> > context. We can simply workaround this by substituting the
> > MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
> > forcing the context to be saved and restored; thereby reloading the
> > PP_DIR registers and updating the modified page directory!
> > 
> > v2: Undo attempted optimisation in caller (Tvrtko)
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Matthew Auld <matthew.william.auld at gmail.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_ringbuffer.c | 27 +++++++++++++++++++++++++
> >   1 file changed, 27 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 65811e2fa7da..39108d8dcec5 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >               (HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
> >               INTEL_INFO(i915)->num_rings - 1 :
> >               0;
> > +     bool force_restore = false;
> >       int len;
> >       u32 *cs;
> >   
> > @@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >       len = 4;
> >       if (IS_GEN7(i915))
> >               len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> > +     if (flags & MI_FORCE_RESTORE) {
> > +             GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> > +             flags &= ~MI_FORCE_RESTORE;
> > +             force_restore = true;
> > +             len += 2;
> > +     }
> >   
> >       cs = intel_ring_begin(rq, len);
> >       if (IS_ERR(cs))
> > @@ -1495,6 +1502,26 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >               }
> >       }
> >   
> > +     if (force_restore) {
> > +             /*
> > +              * The HW doesn't handle being told to restore the current
> > +              * context very well. Quite often it likes goes to go off and
> > +              * sulk, especially when it is meant to be reloading PP_DIR.
> > +              * A very simple fix to force the reload is to simply switch
> > +              * away from the current context and back again.
> > +              *
> > +              * Note that the kernel_context will contain random state
> > +              * following the INHIBIT_RESTORE. We accept this since we
> > +              * never use the kernel_context state; it is merely a
> > +              * placeholder we use to flush other contexts.
> > +              */
> > +             *cs++ = MI_SET_CONTEXT;
> > +             *cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
> > +                                                       engine)->state) |
> > +                     MI_MM_SPACE_GTT |
> > +                     MI_RESTORE_INHIBIT;
> > +     }
> > +
> >       *cs++ = MI_NOOP;
> >       *cs++ = MI_SET_CONTEXT;
> >       *cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
> > 
> 
> One more thing - I assume there is some performance penalty in switching 
> two times, and since the commit message says the issue is on Gen7 - 
> should you skip the double-switch on Gen6? Or when full ppgtt is not 
> enabled? (If that will be supported at all.)

(Hmm, apologies if I sent this twice.)

No. We only do FORCE_RESTORE if the dirty flag is set, and that will
only be set on the first load for aliasing_ppgtt, so no runtime impact.
We don't see any effect in gem_ctx_switch, but that as a static PD, as
will most context switches! I expected to see the LRI, w/a, SRM and
second EMIT_INVALIDATE to add extra overhead to a context switch (and
worse to a nop request), but the results seem encouraging that they are
in the noise.
-Chris