[Intel-gfx] [PATCH] drm/i915/execlists: Poison the CSB after use

Tue Oct 30 09:37:15 UTC 2018

Quoting Mika Kuoppala (2018-10-30 09:31:56)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > After reading the event status from the CSB, write back 0 (an invalid
> > value) so we can detect if the HW should signal a new event without
> > writing the event in the future.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 22b57b8926fc..126efe20d2d6 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -910,6 +910,9 @@ static void process_csb(struct intel_engine_cs *engine)
> >                         execlists->active);
> >  
> >               status = buf[2 * head];
> > +             GEM_BUG_ON(!status);
> 
> Assuming we still have a timing issue in here, how about
> we poll a little until status != 0 and then continue with warning?

If there's any race condition here, we definitely do not want to paper
over it.

> We could recover by finding the 'bit late' status, instead of
> oopsing out.

Oopsing out tells us where the problem is very concisely.

> > +             GEM_DEBUG_EXEC(WRITE_ONCE(*(u32 *)(buf + 2 * head), 0));
> 
> What I am afraid here is that we change the timing and cache dynamics
> for our debug builds so that we bury the pesky thing.

That too is a result.

> Perhaps I am wandering too far but lets consider for the csb loop:
> 
> read head,tail;
> rmb();
> 
> for_each_csb() {
>   64 bit read 
>   64 bit write to zero it, unconditionally 
>   act_on_it()
> }
> 
> Too heavy?

Too papery - shouts that we don't know what we or the hw is doing. We
want to pretend that we know what we are doing at least.
-Chris