[Intel-gfx] [PATCH] drm/i915/gt: Confirm the context survives execution

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Oct 14 09:06:11 UTC 2020


On 14/10/2020 09:43, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-10-14 09:36:08)
>>
>> On 13/10/2020 16:35, Chris Wilson wrote:
>>> Repeat our sanitychecks from before execution to after execution. One
>>> expects that if we were to see these, the gpu would already be on fire,
>>> but the timing may be informative.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 10 +++++++---
>>>    1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index 287537089c77..3dbdd5d0cb60 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -1216,7 +1216,8 @@ static void intel_engine_context_out(struct intel_engine_cs *engine)
>>>    
>>>    static void
>>>    execlists_check_context(const struct intel_context *ce,
>>> -                     const struct intel_engine_cs *engine)
>>> +                     const struct intel_engine_cs *engine,
>>> +                     const char *when)
>>>    {
>>>        const struct intel_ring *ring = ce->ring;
>>>        u32 *regs = ce->lrc_reg_state;
>>> @@ -1251,7 +1252,7 @@ execlists_check_context(const struct intel_context *ce,
>>>                valid = false;
>>>        }
>>>    
>>> -     WARN_ONCE(!valid, "Invalid lrc state found before submission\n");
>>> +     WARN_ONCE(!valid, "Invalid lrc state found %s submission\n", when);
>>>    }
>>>    
>>>    static void restore_default_state(struct intel_context *ce,
>>> @@ -1347,7 +1348,7 @@ __execlists_schedule_in(struct i915_request *rq)
>>>                reset_active(rq, engine);
>>>    
>>>        if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
>>> -             execlists_check_context(ce, engine);
>>> +             execlists_check_context(ce, engine, "before");
>>>    
>>>        if (ce->tag) {
>>>                /* Use a fixed tag for OA and friends */
>>> @@ -1418,6 +1419,9 @@ __execlists_schedule_out(struct i915_request *rq,
>>>         * refrain from doing non-trivial work here.
>>>         */
>>>    
>>> +     if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
>>> +             execlists_check_context(ce, engine, "after");
>>> +
>>
>> CI failures here are either something super scary or a simple mistake
>> which I cannot see. Or is engine retire, possible queued up before,
>> racing with current schedule_out?
> 
> It's the unpark while the process_csb is not yet flushed, so we scrub
> the kernel_context before it is scheduled-out. It could in theory be a
> real problem with our scrubbing to simulate an issue causing an issue,
> but the timing is quite slim.

Unpark with unflushed process_csb? I thought maybe you meant park, but 
poisoning is indeed in unpark. Put pending process_csb means engine is 
supposed to be unparked already. Or you are saying it went through the 
parked-unparked cycle all with pending process_csb?

Regards,

Tvrtko




More information about the Intel-gfx mailing list