[Intel-gfx] [PATCH] drm/i915/gt: Confirm the context survives execution
Chris Wilson
chris at chris-wilson.co.uk
Wed Oct 14 09:09:43 UTC 2020
Quoting Tvrtko Ursulin (2020-10-14 10:06:11)
>
> On 14/10/2020 09:43, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-10-14 09:36:08)
> >>
> >> On 13/10/2020 16:35, Chris Wilson wrote:
> >>> Repeat our sanitychecks from before execution to after execution. One
> >>> expects that if we were to see these, the gpu would already be on fire,
> >>> but the timing may be informative.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> ---
> >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 10 +++++++---
> >>> 1 file changed, 7 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 287537089c77..3dbdd5d0cb60 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -1216,7 +1216,8 @@ static void intel_engine_context_out(struct intel_engine_cs *engine)
> >>>
> >>> static void
> >>> execlists_check_context(const struct intel_context *ce,
> >>> - const struct intel_engine_cs *engine)
> >>> + const struct intel_engine_cs *engine,
> >>> + const char *when)
> >>> {
> >>> const struct intel_ring *ring = ce->ring;
> >>> u32 *regs = ce->lrc_reg_state;
> >>> @@ -1251,7 +1252,7 @@ execlists_check_context(const struct intel_context *ce,
> >>> valid = false;
> >>> }
> >>>
> >>> - WARN_ONCE(!valid, "Invalid lrc state found before submission\n");
> >>> + WARN_ONCE(!valid, "Invalid lrc state found %s submission\n", when);
> >>> }
> >>>
> >>> static void restore_default_state(struct intel_context *ce,
> >>> @@ -1347,7 +1348,7 @@ __execlists_schedule_in(struct i915_request *rq)
> >>> reset_active(rq, engine);
> >>>
> >>> if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> >>> - execlists_check_context(ce, engine);
> >>> + execlists_check_context(ce, engine, "before");
> >>>
> >>> if (ce->tag) {
> >>> /* Use a fixed tag for OA and friends */
> >>> @@ -1418,6 +1419,9 @@ __execlists_schedule_out(struct i915_request *rq,
> >>> * refrain from doing non-trivial work here.
> >>> */
> >>>
> >>> + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> >>> + execlists_check_context(ce, engine, "after");
> >>> +
> >>
> >> CI failures here are either something super scary or a simple mistake
> >> which I cannot see. Or is engine retire, possible queued up before,
> >> racing with current schedule_out?
> >
> > It's the unpark while the process_csb is not yet flushed, so we scrub
> > the kernel_context before it is scheduled-out. It could in theory be a
> > real problem with our scrubbing to simulate an issue causing an issue,
> > but the timing is quite slim.
>
> Unpark with unflushed process_csb? I thought maybe you meant park, but
> poisoning is indeed in unpark. Put pending process_csb means engine is
> supposed to be unparked already. Or you are saying it went through the
> parked-unparked cycle all with pending process_csb?
Yes. A pending CSB has a GT wakeref (for the interrupt) not an engine
wakeref, which boils down to that we use the engine parking to force the
context switch with one last submission.
-Chris
More information about the Intel-gfx
mailing list