[Intel-gfx] [PATCH 4/6] drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen7

Mon Dec 31 10:56:15 UTC 2018

Quoting Tvrtko Ursulin (2018-12-31 10:43:07)
> 
> On 28/12/2018 17:16, Chris Wilson wrote:
> > The irq_seqno_barrier is a tradeoff between doing work on every request
> > (on the GPU) and doing work after every interrupt (on the CPU). We
> > presume we have many more requests than interrupts! However, the current
> > w/a for Ivybridge is an implicit delay that currently fails sporadically
> > and consistently if we move the w/a into the irq handler itself. This
> > makes the CPU barrier untenable for upcoming interrupt handler changes
> > and so we need to replace it with a delay on the GPU before we send the
> > MI_USER_INTERRUPT. As it turns out that delay is 32x MI_STORE_DWORD_IMM,
> > or about 0.6us per request! Quite nasty, but the lesser of two evils
> > looking to the future.
> > 
> > Testcase: igt/gem_sync
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++++++++-----------
> >   1 file changed, 44 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 2fb3a364c390..dd996103d495 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -443,6 +443,34 @@ static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> >   }
> >   static const int gen6_xcs_emit_breadcrumb_sz = 4;
> >   
> > +#define GEN7_XCS_WA 32
> > +static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> > +{
> > +     int i;
> > +
> > +     *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
> > +     *cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
> > +     *cs++ = rq->global_seqno;
> > +
> > +     for (i = 0; i < GEN7_XCS_WA; i++) {
> > +             *cs++ = MI_STORE_DWORD_INDEX;
> > +             *cs++ = I915_GEM_HWS_INDEX_ADDR;
> > +             *cs++ = rq->global_seqno;
> > +     }
> > +
> > +     *cs++ = MI_FLUSH_DW;
> > +     *cs++ = 0;
> > +     *cs++ = 0;
> > +
> > +     *cs++ = MI_USER_INTERRUPT;
> > +     *cs++ = MI_NOOP;
> > +
> > +     rq->tail = intel_ring_offset(rq, cs);
> > +     assert_ring_tail_valid(rq->ring, rq->tail);
> > +}
> > +static const int gen7_xcs_emit_breadcrumb_sz = 8 + GEN7_XCS_WA * 3;
> 
> Is it worth moving this before the function, and then in the function 
> have a little GEM_BUG_ON using cs pointer arithmetic to check the two 
> sizes match? Or even simpler the function could use 
> engine->emit_breadcrumb_sz to check against.

I have a sketch of an idea to build a request during
intel_engine_setup_common() and measure the actual breadcrumb_sz. Which
I think is the best way to remove the fragile counting by hand. (But it
required more than 5 minutes of mocking around, so it ended up on the
list of things to do later.)

Later on, we do add a check that the emit_breadcrumb matches the
breadcrumb_sz when we move the emission into i915_request_add() and can
rely on ring->emit being valid.
-Chris