[Intel-gfx] [PATCH] drm/i915: Fail gpu reset if the forcewake fifo hasn't drained

Ben Widawsky ben at bwidawsk.net
Sat Mar 8 19:50:41 CET 2014


On Fri, Mar 07, 2014 at 10:35:56PM +0100, Daniel Vetter wrote:
> On Fri, Mar 07, 2014 at 09:09:03PM +0100, Daniel Vetter wrote:
> > Since the gpu reset + full ppgtt merge we have a hard hang on snb when
> > running the gem_reset_stat tests. Recently Mika also some more strict
> > forcewake fifo warnigns for gen6/7 in
> > 
> > commit 20277c60ed08ab4f7237854cc6c2046649f9200f
> > Author: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Date:   Wed Mar 5 18:08:19 2014 +0200
> > 
> >     drm/i915: Always set fifo count to zero in gen6_reset
> > 
> > and they _do_ fire just right before the the final failing reset which
> > then results in the machine's ultimate demise.
> > 
> > So use this indicator to fail the gpu reset with an -EIO code,
> > preventing further command submission, further hangs and so the deadly
> > final gpu reset attempt. It seems to work and my snb survives now.
> > 
> > The gpu is still dead though unfortunately.
> > 
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index c666af8232ef..9e22b11d0b0c 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -989,9 +989,11 @@ static int gen6_do_reset(struct drm_device *dev)
> >  	if (fw_engine)
> >  		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_engine);
> >  
> > -	if (IS_GEN6(dev) || IS_GEN7(dev))
> > -		WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) &
> > -			 GT_FIFO_FREE_ENTRIES_MASK) != 0);
> > +	if (IS_GEN6(dev) || IS_GEN7(dev)) {
> > +		if (WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) &
> > +			     GT_FIFO_FREE_ENTRIES_MASK) != 0))
> > +		    ret = -EIO;
> 
> Chris pointed out that this WARN doesn't make much sense, and testing
> confirmed that this completely breaks gpu reset on my machines here.
> 
> I've backed out Mika's original patch, this seems to be the wrong path.
> -Daniel
> 
> > +	}
> >  
> >  	dev_priv->uncore.fifo_count = 0;
> >  

I've seen this too. Though I think the WARN does coincide with what the
docs state - it doesn't seem to match reality. So I totally agree this
is the right course.

However, for my curiosity, Chris, can you elaborate on why you think it
doesn't make sense?


-- 
Ben Widawsky, Intel Open Source Technology Center



More information about the Intel-gfx mailing list