[Intel-gfx] [PATCH] drm/i915: Park signaling thread while wrapping the seqno

Fri Oct 26 10:08:39 UTC 2018

Quoting Tvrtko Ursulin (2018-10-26 10:58:22)
> 
> On 24/10/2018 11:49, Chris Wilson wrote:
> > A danger encountered when resetting the seqno (using
> > debugfs/i915_next_seqno) is that as we change the breadcrumb stored in
> > the HWSP, it may be inspected by the signaler thread leading to
> > confusion in our sanity checks.
> > 
> > <0> [136.331342] i915/sig-347     3..s1 136336154us : execlists_submission_tasklet: rcs0 awake?=1, active=5
> > <0> [136.331373] i915/sig-347     3d.s2 136336155us : process_csb: rcs0 cs-irq head=5, tail=0
> > <0> [136.331402] i915/sig-347     3d.s2 136336155us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000002, active=0x5
> > <0> [136.331434] i915/sig-347     3d.s2 136336156us : process_csb: rcs0 out[0]: ctx=2.1, global=219 (fence 46:8455) (current 219), prio=0
> > <0> [136.331466] i915/sig-347     3d.s2 136336156us : process_csb: rcs0 completed ctx=2
> > <0> [136.332027] gem_exec-1049    0.... 136336246us : reset_all_global_seqno.part.5: rcs0 seqno 219 (current 219) -> -43
> > <0> [136.332056] gem_exec-1049    0.... 136336251us : reset_all_global_seqno.part.5: bcs0 seqno 183 (current 183) -> -43
> > <0> [136.332085] gem_exec-1049    0.... 136336255us : reset_all_global_seqno.part.5: vcs0 seqno 191 (current 191) -> -43
> > <0> [136.332114] gem_exec-1049    0.... 136336259us : reset_all_global_seqno.part.5: vcs1 seqno 180 (current 180) -> -43
> > <0> [136.332143] gem_exec-1049    0.... 136336262us : reset_all_global_seqno.part.5: vecs0 seqno 212 (current 212) -> -43
> > <0> [136.332174] i915/sig-347     3.... 136336280us : intel_breadcrumbs_signaler: intel_breadcrumbs_signaler:673 GEM_BUG_ON(!i915_request_completed(rq))
> 
> Why this can't happen with normal seqno wrap? And if it can, do we need 
> a fixes tag?

It can only happen if we go backwards in seqno.

> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 28819f8c4da6..71107540581d 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -136,6 +136,8 @@ static int reset_all_global_seqno(struct drm_i915_private *i915, u32 seqno)
> >                         intel_engine_get_seqno(engine),
> >                         seqno);
> >   
> > +             kthread_park(engine->breadcrumbs.signaler);
> > +
> 
> Needed even if increasing the seqno? It wouldn't work to do it under the 
> i915_seqno_passed below?

It's not needed, just the onion felt better. Cost is not really an
issue, we've already idled the GPU so flushing the kthread is not
worrying.

We might be able to replace the onion with a 
static void __signaler_flush(struct intel_engine_cs *engine)
{
	struct task_stuct *tsk = engine->breadcrumbs.signaler;

	kthread_park(tsk);
	kthread_unpark(tsk);
}
call from inside !seqno_passed, but I feel more confident if the
signaler cannot be kicked while the HWSP is in flux.
-Chris