[Intel-gfx] [PATCH] drm/i915: Rework GPU reset sequence to match driver load & thaw
Mcaulay, Alistair
alistair.mcaulay at intel.com
Thu Jul 31 18:37:14 CEST 2014
Hi Daniel,
Something more like this then? (and revert the change to intel_ring_begin(), putting it back to how it was )
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 991b663..b811ff2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1217,6 +1217,9 @@ struct i915_gpu_error {
/* For missed irq/seqno simulation. */
unsigned int test_irq_rings;
+
+ /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
+ bool reload_in_progress;
};
enum modeset_restore {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b38e086..a25d3b5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
if (i915_terminally_wedged(error))
return -EIO;
- return -EAGAIN;
+ /* Check if GPU Reset is in progress */
+ if (!error->reload_in_reset)
+ return -EAGAIN;
}
return 0;
@@ -2579,6 +2581,8 @@ void i915_gem_reset(struct drm_device *dev)
struct intel_engine_cs *ring;
int i;
+ /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
+ dev_priv->gpu_error.reload_in_reset = true;
/*
* Before we free the objects from the requests, we need to inspect
* them for finding the guilty party. As the requests only borrow
@@ -2591,6 +2595,8 @@ void i915_gem_reset(struct drm_device *dev)
i915_gem_reset_ring_cleanup(dev_priv, ring);
i915_gem_restore_fences(dev);
+
+ dev_priv->gpu_error.reload_in_reset = false;
}
-----Original Message-----
From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel Vetter
Sent: Wednesday, July 30, 2014 10:01 PM
To: Mcaulay, Alistair
Cc: Daniel Vetter; Chris Wilson; Ben Widawsky; intel-gfx at lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Rework GPU reset sequence to match driver load & thaw
On Wed, Jul 30, 2014 at 04:59:33PM +0000, Mcaulay, Alistair wrote:
> Hi Daniel,
>
> could you please be clearer on the change you mean. I think you mean something functionally equivalent to the code below, but done in a less hacky way.
> (This slight change has made no change to test results) Or is the idea
> to return at a different point to this?
> I couldn't find " dev_priv->mm.reload_in_reset or similar" in the
> code. The only thing I can find is error->reset_counter, which is used
> in check_wedge(). Bottom bit set means RESET_IN_PROGRESS, top bit
> means WEDGED
Well I've meant that you have to add a new dev_prive->mm.realod_in_reset.
And the below won't work since in all other places but when doing a gpu reset we want the -EAGAIN to reach callers. Actually it's really important that if we have an -EGAIN we don't eat it.
And I guess the check for mm.reload_in_reset should actually be in gem_check_wedged.
-Daniel
>
>
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1832,7 +1832,9 @@ int intel_ring_begin(struct intel_engine_cs
> *ring,
>
> ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> dev_priv->mm.interruptible);
> - if (ret)
> +
> + /* -EAGAIN means a reset is in progress, it is Ok to return */
> + if (ret == -EAGAIN)
> + return 0;
> + if (ret)
> + return ret;
>
> ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
>
> Alistair.
>
> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On
> Behalf Of Daniel Vetter
> Sent: Tuesday, July 29, 2014 11:33 AM
> To: Chris Wilson; Daniel Vetter; Ben Widawsky;
> intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Rework GPU reset sequence
> to match driver load & thaw
>
> On Tue, Jul 29, 2014 at 08:36:33AM +0100, Chris Wilson wrote:
> > On Mon, Jul 28, 2014 at 11:26:38AM +0200, Daniel Vetter wrote:
> > > Oh, I guess that's the tricky bit why the old approach never
> > > worked
> > > - because reset_in_progress is set we failed the context/ppgtt
> > > loading through the rings and screwed up.
> > >
> > > Problem with your approach is that we want to bail out here if a
> > > reset is in progress, so we can't just eat the EAGAIN. If we do
> > > that we potentially deadlock or overflow the ring.
> > >
> > > I think we need a different hack here, and a few layers down (i.e.
> > > at the place where we actually generate that offending -EAGAIN).
> > >
> > > - Around the re-init sequence in the reset function we set
> > > dev_priv->mm.reload_in_reset or similar
>
> . Since we hold dev->struct_mutex
> > > no one will see that, as long as we never leak it out of the critical
> > > section.
> > >
> > > - In the ring_begin code that checks for gpu hangs we ignore
> > > reset_in_progress if this bit is set.
> > >
> > > - Both places need fairly big comments to explain what exactly is going
> > > on.
> >
> > This is going from bad to worse. I think you can do better if you
> > looked at the problem afresh.
>
> Well we can't really reset reset_in_progress at that point, since not all reset is done yet. Especially the modeset stuff. So I don't think that reordering the reset sequence would get us out of this ugly spot. And I don't see any other solution really. Do you?
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list