[Intel-gfx] [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere

Mon Aug 28 19:46:00 UTC 2017

On Mon, Aug 28, 2017 at 12:41:58PM -0700, Michel Thierry wrote:
> On 28/08/17 12:25, jeff.mcgee at intel.com wrote:
> >From: Jeff McGee <jeff.mcgee at intel.com>
> >
> >If someone else is resetting the engine we should clear our own bit as
> >part of skipping that engine. Otherwise we will later believe that it
> >has not been reset successfully and then trigger full gpu reset. If the
> >other guy's reset actually fails, he will trigger the full gpu reset.
> >
> 
> Did you hit this by manually setting wedged to 'x' ring repeatedly?
> 
I haven't actually reproduced it. Have just been looking at the code a
lot to try to develop reset for preemption enforcement. The implementation
will call i915_handle_error from another work item that can run concurrent
with hangcheck.

> >Signed-off-by: Jeff McGee <jeff.mcgee at intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_irq.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 5d391e689070..575d618ccdbf 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -2711,8 +2711,10 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> >  		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> >  			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> >  			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> >-					     &dev_priv->gpu_error.flags))
> >+					     &dev_priv->gpu_error.flags)) {
> >+				engine_mask &= ~intel_engine_flag(engine);
> >  				continue;
> >+			}
> >  			if (i915_reset_engine(engine, 0) == 0)
> >  				engine_mask &= ~intel_engine_flag(engine);
> >
> 
> Reviewed-by: Michel Thierry <michel.thierry at intel.com>