[Intel-gfx] [PATCH] drm/i915: Park signaling thread while wrapping the seqno

Fri Oct 26 11:05:01 UTC 2018

On 26/10/2018 11:08, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-10-26 10:58:22)
>>
>> On 24/10/2018 11:49, Chris Wilson wrote:
>>> A danger encountered when resetting the seqno (using
>>> debugfs/i915_next_seqno) is that as we change the breadcrumb stored in
>>> the HWSP, it may be inspected by the signaler thread leading to
>>> confusion in our sanity checks.
>>>
>>> <0> [136.331342] i915/sig-347     3..s1 136336154us : execlists_submission_tasklet: rcs0 awake?=1, active=5
>>> <0> [136.331373] i915/sig-347     3d.s2 136336155us : process_csb: rcs0 cs-irq head=5, tail=0
>>> <0> [136.331402] i915/sig-347     3d.s2 136336155us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000002, active=0x5
>>> <0> [136.331434] i915/sig-347     3d.s2 136336156us : process_csb: rcs0 out[0]: ctx=2.1, global=219 (fence 46:8455) (current 219), prio=0
>>> <0> [136.331466] i915/sig-347     3d.s2 136336156us : process_csb: rcs0 completed ctx=2
>>> <0> [136.332027] gem_exec-1049    0.... 136336246us : reset_all_global_seqno.part.5: rcs0 seqno 219 (current 219) -> -43
>>> <0> [136.332056] gem_exec-1049    0.... 136336251us : reset_all_global_seqno.part.5: bcs0 seqno 183 (current 183) -> -43
>>> <0> [136.332085] gem_exec-1049    0.... 136336255us : reset_all_global_seqno.part.5: vcs0 seqno 191 (current 191) -> -43
>>> <0> [136.332114] gem_exec-1049    0.... 136336259us : reset_all_global_seqno.part.5: vcs1 seqno 180 (current 180) -> -43
>>> <0> [136.332143] gem_exec-1049    0.... 136336262us : reset_all_global_seqno.part.5: vecs0 seqno 212 (current 212) -> -43
>>> <0> [136.332174] i915/sig-347     3.... 136336280us : intel_breadcrumbs_signaler: intel_breadcrumbs_signaler:673 GEM_BUG_ON(!i915_request_completed(rq))
>>
>> Why this can't happen with normal seqno wrap? And if it can, do we need
>> a fixes tag?
> 
> It can only happen if we go backwards in seqno.
> 
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 28819f8c4da6..71107540581d 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -136,6 +136,8 @@ static int reset_all_global_seqno(struct drm_i915_private *i915, u32 seqno)
>>>                          intel_engine_get_seqno(engine),
>>>                          seqno);
>>>    
>>> +             kthread_park(engine->breadcrumbs.signaler);
>>> +
>>
>> Needed even if increasing the seqno? It wouldn't work to do it under the
>> i915_seqno_passed below?
> 
> It's not needed, just the onion felt better. Cost is not really an
> issue, we've already idled the GPU so flushing the kthread is not
> worrying.
> 
> We might be able to replace the onion with a
> static void __signaler_flush(struct intel_engine_cs *engine)
> {
> 	struct task_stuct *tsk = engine->breadcrumbs.signaler;
> 
> 	kthread_park(tsk);
> 	kthread_unpark(tsk);
> }
> call from inside !seqno_passed, but I feel more confident if the
> signaler cannot be kicked while the HWSP is in flux.

Okay, makes sense.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko