[Intel-gfx] [PATCH] drm/i915: Fail gpu reset if the forcewake fifo hasn't drained
Ben Widawsky
ben at bwidawsk.net
Sat Mar 8 19:50:41 CET 2014
On Fri, Mar 07, 2014 at 10:35:56PM +0100, Daniel Vetter wrote:
> On Fri, Mar 07, 2014 at 09:09:03PM +0100, Daniel Vetter wrote:
> > Since the gpu reset + full ppgtt merge we have a hard hang on snb when
> > running the gem_reset_stat tests. Recently Mika also some more strict
> > forcewake fifo warnigns for gen6/7 in
> >
> > commit 20277c60ed08ab4f7237854cc6c2046649f9200f
> > Author: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Date: Wed Mar 5 18:08:19 2014 +0200
> >
> > drm/i915: Always set fifo count to zero in gen6_reset
> >
> > and they _do_ fire just right before the the final failing reset which
> > then results in the machine's ultimate demise.
> >
> > So use this indicator to fail the gpu reset with an -EIO code,
> > preventing further command submission, further hangs and so the deadly
> > final gpu reset attempt. It seems to work and my snb survives now.
> >
> > The gpu is still dead though unfortunately.
> >
> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > ---
> > drivers/gpu/drm/i915/intel_uncore.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index c666af8232ef..9e22b11d0b0c 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -989,9 +989,11 @@ static int gen6_do_reset(struct drm_device *dev)
> > if (fw_engine)
> > dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_engine);
> >
> > - if (IS_GEN6(dev) || IS_GEN7(dev))
> > - WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) &
> > - GT_FIFO_FREE_ENTRIES_MASK) != 0);
> > + if (IS_GEN6(dev) || IS_GEN7(dev)) {
> > + if (WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) &
> > + GT_FIFO_FREE_ENTRIES_MASK) != 0))
> > + ret = -EIO;
>
> Chris pointed out that this WARN doesn't make much sense, and testing
> confirmed that this completely breaks gpu reset on my machines here.
>
> I've backed out Mika's original patch, this seems to be the wrong path.
> -Daniel
>
> > + }
> >
> > dev_priv->uncore.fifo_count = 0;
> >
I've seen this too. Though I think the WARN does coincide with what the
docs state - it doesn't seem to match reality. So I totally agree this
is the right course.
However, for my curiosity, Chris, can you elaborate on why you think it
doesn't make sense?
--
Ben Widawsky, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list