[Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
Chris Wilson
chris at chris-wilson.co.uk
Wed Nov 7 15:12:18 UTC 2018
Quoting Ville Syrjälä (2018-11-07 15:04:24)
> On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> > Exercising the gpu reloc path strenuously revealed an issue where the
> > updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> > upon execution. After some experiments with adding pipecontrols (a lot
> > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> > controls or even the current on), it was discovered that we merely
> > needed to delay the EMIT_INVALIDATE by several flushes. It is important
> > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> > needs the delay as opposed to what one might first expect -- that the
> > delay is required for the TLB invalidation to take effect (one presumes
> > to purge any CS buffers) as opposed to a delay after flushing to ensure
> > the writes have landed before triggering invalidation.
> >
> > Testcase: igt/gem_tiled_fence_blits
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: stable at vger.kernel.org
> > ---
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
> > 1 file changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index b8a7a014d46d..87eebc13c0d8 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -91,6 +91,7 @@ static int
> > gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> > {
> > u32 cmd, *cs;
> > + int i;
> >
> > /*
> > * read/write caches:
> > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> > cmd |= MI_INVALIDATE_ISP;
> > }
> >
> > - cs = intel_ring_begin(rq, 2);
> > + i = 2;
> > + if (mode & EMIT_INVALIDATE)
> > + i += 20;
> > +
> > + cs = intel_ring_begin(rq, i);
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> >
> > *cs++ = cmd;
> > - *cs++ = MI_NOOP;
> > +
> > + /*
> > + * A random delay to let the CS invalidate take effect? Without this
> > + * delay, the GPU relocation path fails as the CS does not see
> > + * the updated contents. Just as important, if we apply the flushes
> > + * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> > + * write and before the invalidate on the next batch), the relocations
> > + * still fail. This implies that is a delay following invalidation
> > + * that is required to reset the caches as opposed to a delay to
> > + * ensure the memory is written.
> > + */
> > + if (mode & EMIT_INVALIDATE) {
> > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> > + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> > + PIPE_CONTROL_GLOBAL_GTT;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > +
> > + for (i = 0; i < 12; i++)
> > + *cs++ = MI_FLUSH;
> > +
> > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> > + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> > + PIPE_CONTROL_GLOBAL_GTT;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + }
>
> This smells a lot like the snb a/b w/a, except there the spec says to
> use 8 STORE_DWORDS.
Yeah, the similarity wasn't lost, except that w/a is to cover the
coherency aspect of the writes not being flushed. This feels a bit
fishier in that the experiments indicate it's an issue on the
invalidation path as opposed to flushing the writes.
And the other w/a to use umpteen pipecontrols to get around the lack of
PIPE_CONTROL_FLUSH.
> I suppose the choice of a specific command isn't
> critical, and it's just a matter of stuffing the pipeline with something
> that's takes long enough to let the TLB invalidate finish?
Except the MI_FLUSH are more effective in fewer number than
PIPE_CONTROLs. Probably because each one translates to a few pipe
controls or something, quite heavy.
> Anyways, patch itself seems as reasonable as one might expect for an
> issue like this.
As nasty as one would expect.
For the record, we are not entirely out of danger. gem_exec_whisper
continues to indicate a problem, but one step at a time. (I haven't yet
found quite what's upsetting it yet, except if we do each batch
synchronously and verify each one, it's happy.)
-Chris
More information about the Intel-gfx
mailing list