[Intel-gfx] [PATCH 3/3] drm/i915: Disable PMSI sleep messages on all rings around context switches

Chris Wilson chris at chris-wilson.co.uk
Mon Dec 15 08:24:48 PST 2014


On Mon, Dec 15, 2014 at 10:41:52AM +0100, Daniel Vetter wrote:
> On Thu, Dec 11, 2014 at 08:17:01AM +0000, Chris Wilson wrote:
> > There exists a current workaround to prevent a hang on context switch
> > should the ring go to sleep in the middle of the restore,
> > WaProgramMiArbOnOffAroundMiSetContext (applicable to all gen7+). In
> > spite of disabling arbitration (which prevents the ring from powering
> > down during the critical section) we were still hitting hangs that had
> > the hallmarks of the known erratum. That is we are still seeing hangs
> > "on the last instruction in the context restore". By comparing -nightly
> > (broken) with requests (working), we were able to deduce that it was the
> > semaphore LRI cross-talk that reproduced the original failure.
> > Explicitly disabling PMSI sleep on the RCS ring was insufficient, all
> > the rings had to be awake to prevent the hangs. Fortunately, we can
> > reduce the wakelock to the MI_SET_CONTEXT operation itself, and so
> > should be able to limit the extra power implications.
> > 
> > Since the MI_ARB_ON_OFF workaround is listed for all gen7 and above
> > products, we should apply this extra hammer for all of the same
> > platforms despite so far that we have only been able to reproduce the
> > hang on certain ivb and hsw models. The last question is whether we want
> > to always use the extra hammer or only when we know semaphores are in
> > operation. At the moment, we only use LRI on non-RCS rings for
> > semaphores, but that may change in the future with the possibility of
> > reintroducing this bug under subtle conditions.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80660
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83677
> > Cc: Simon Farnsworth <simon at farnz.org.uk>
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: stable at vger.kernel.org
> 
> Oh dear, but awesome piece of debugging!
> 
> First question: How does this work on vlv with the split forcewake domain?
> Any bad side effects like longer ctx switch times?

It will work on vlv as well since the split power domains do not matter
here: the LRI is on an active ring, and the result of the LRI is to
temporary wake up the other domain.
The negative impact is that we do wake up other power domains and rings
during the context switch, a small but definite power increase.

> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 39 ++++++++++++++++++++++++---------
> >  drivers/gpu/drm/i915/i915_reg.h         |  2 ++
> >  2 files changed, 31 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2acf5803cf32..724ccecea06a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -484,7 +484,9 @@ mi_set_context(struct intel_engine_cs *ring,
> >  	       u32 hw_flags)
> >  {
> >  	u32 flags = hw_flags | MI_MM_SPACE_GTT;
> > -	int ret;
> > +	struct intel_engine_cs *engine;
> > +	const int num_rings = i915_semaphore_is_enabled(ring->dev) ? hweight32(INTEL_INFO(ring->dev)->ring_mask) : 0;
> > +	int len, i, ret;
> >  
> >  	/* w/a: If Flush TLB Invalidation Mode is enabled, driver must do a TLB
> >  	 * invalidation prior to MI_SET_CONTEXT. On GEN6 we don't set the value
> > @@ -501,15 +503,26 @@ mi_set_context(struct intel_engine_cs *ring,
> >  	if (!IS_HASWELL(ring->dev) && INTEL_INFO(ring->dev)->gen < 8)
> >  		flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> >  
> > -	ret = intel_ring_begin(ring, 6);
> > +
> > +	len = 4;
> > +	if (INTEL_INFO(ring->dev)->gen >= 7)
> > +		len += 4*num_rings + 2;
> > +
> > +	ret = intel_ring_begin(ring, len);
> >  	if (ret)
> >  		return ret;
> >  
> >  	/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
> > -	if (INTEL_INFO(ring->dev)->gen >= 7)
> > -		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
> > -	else
> > -		intel_ring_emit(ring, MI_NOOP);
> > +	if (INTEL_INFO(ring->dev)->gen >= 7) {
> > +		if (num_rings) {
> > +			intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings));
> > +			for_each_ring(engine, to_i915(ring->dev), i) {
> > +				intel_ring_emit(ring, RING_PSMI_CTL(engine->mmio_base));
> > +				intel_ring_emit(ring, _MASKED_BIT_ENABLE(GEN6_PSMI_SLEEP_MSG_DISABLE));
> > +			}
> > +		} else
> > +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
> 
> For added paranoia, shouldn't we keep this one here? Or is the only thing
> this does frob the idle notifications like PSMI? I kinda prefer we just
> keep it if there's no harmful effects ...

Yeah, MI_ARB_ON_OFF is just a magic command used to do the same as the
PSMI twiddling on the local ring.

We can do MI_ARB_ON_OFF + LRI(!RCS) which is slightly shorter
instruction wise, closer to the original w/a and looks slightly neater.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list