[Intel-gfx] [PATCH] drm/i915: Don't unwedge if reset is disabled

Chris Wilson chris at chris-wilson.co.uk
Mon Sep 9 21:45:44 UTC 2019


Quoting Daniele Ceraolo Spurio (2019-09-09 17:27:47)
> 
> 
> On 9/7/19 1:39 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> >>
> >>
> >> On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> >>> When trying to reset a device with reset capability disabled or not
> >>> supported while rings are full of requests, it has been observed when
> >>> running in execlists submission mode that command stream buffer tail
> >>> tends to be incremented by apparently still running GPU regardless of
> >>> all requests being already cancelled and command stream buffer pointers
> >>> reset.  As a result, kernel panic on NULL pointer dereference occurs
> >>> when a trace_ports() helper is called with command stream buffer tail
> >>> incremented but request pointers being NULL during final
> >>> __intel_gt_set_wedged() operation called from intel_gt_reset().
> >>>
> >>> Skip actual reset procedure if reset is disabled or not supported.
> >>
> >> This last sentence is a bit confusing. You're not skipping the reset
> >> procedure, you're skipping the attempt of unwedging and resetting again
> >> after a reset & wedge already happened.
> > 
> > Loss of email over the last week, so jumping in at the end. My gut
> > response is that this is still just papering over the bug, as what you
> > say above makes no sense.
> > -Chris
> > 
> 
> The issue here is that if we don't reset the HW when we wedge, whatever 
> was running on the engines might complete at any point after that, which 
> generates an unexpected post-wedge CSB event that we don't handle 
> gracefully when we unwedge.

Indeed, until we call reset_default_submission all those unexpected
interrupts are redirected to the nop_submission_tasklet.

I think it should be more along the lines of

 	struct intel_timeline *tl;
 	unsigned long flags;
+	bool ok;

 	if (!test_bit(I915_WEDGED, &gt->reset.flags))
 		return true;
@@ -838,7 +839,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	}
 	spin_unlock_irqrestore(&timelines->lock, flags);

-	intel_gt_sanitize(gt, false);
+	ok = false;
+	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+	if (!ok)
+		return false;

 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from

For bonus points, gpu_reset_clobbers_display should take into account
whether the display is active.
-Chris


More information about the Intel-gfx mailing list