[Intel-gfx] [PATCH] drm/i915: Don't unwedge if reset is disabled
Chris Wilson
chris at chris-wilson.co.uk
Mon Sep 9 21:45:44 UTC 2019
Quoting Daniele Ceraolo Spurio (2019-09-09 17:27:47)
>
>
> On 9/7/19 1:39 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> >>
> >>
> >> On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> >>> When trying to reset a device with reset capability disabled or not
> >>> supported while rings are full of requests, it has been observed when
> >>> running in execlists submission mode that command stream buffer tail
> >>> tends to be incremented by apparently still running GPU regardless of
> >>> all requests being already cancelled and command stream buffer pointers
> >>> reset. As a result, kernel panic on NULL pointer dereference occurs
> >>> when a trace_ports() helper is called with command stream buffer tail
> >>> incremented but request pointers being NULL during final
> >>> __intel_gt_set_wedged() operation called from intel_gt_reset().
> >>>
> >>> Skip actual reset procedure if reset is disabled or not supported.
> >>
> >> This last sentence is a bit confusing. You're not skipping the reset
> >> procedure, you're skipping the attempt of unwedging and resetting again
> >> after a reset & wedge already happened.
> >
> > Loss of email over the last week, so jumping in at the end. My gut
> > response is that this is still just papering over the bug, as what you
> > say above makes no sense.
> > -Chris
> >
>
> The issue here is that if we don't reset the HW when we wedge, whatever
> was running on the engines might complete at any point after that, which
> generates an unexpected post-wedge CSB event that we don't handle
> gracefully when we unwedge.
Indeed, until we call reset_default_submission all those unexpected
interrupts are redirected to the nop_submission_tasklet.
I think it should be more along the lines of
struct intel_timeline *tl;
unsigned long flags;
+ bool ok;
if (!test_bit(I915_WEDGED, >->reset.flags))
return true;
@@ -838,7 +839,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
}
spin_unlock_irqrestore(&timelines->lock, flags);
- intel_gt_sanitize(gt, false);
+ ok = false;
+ if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+ ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+ if (!ok)
+ return false;
/*
* Undo nop_submit_request. We prevent all new i915 requests from
For bonus points, gpu_reset_clobbers_display should take into account
whether the display is active.
-Chris
More information about the Intel-gfx
mailing list