[Intel-gfx] [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno

Wed Jun 10 08:16:20 PDT 2015

On ke, 2015-06-10 at 18:00 +0300, Ville Syrjälä wrote:
> On Wed, Jun 10, 2015 at 05:55:24PM +0300, Imre Deak wrote:
> > On ke, 2015-06-10 at 15:21 +0100, Chris Wilson wrote:
> > > On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote:
> > > > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote:
> > > > > On Mon, 08 Jun 2015, Imre Deak <imre.deak at intel.com> wrote:
> > > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency
> > > > > > problem where the seqno written at GPU command completion time is not
> > > > > > seen by the CPU. This results in __i915_wait_request seeing the stale
> > > > > > seqno and not completing the request (not considering the lost
> > > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case
> > > > > > of a lost interrupt, or that the command didn't complete somehow: when
> > > > > > the coherency issue occured I read the seqno via an uncached GTT mapping
> > > > > > too. While the cached version of the seqno still showed the stale value
> > > > > > the one read via the uncached mapping was the correct one.
> > > > > >
> > > > > > Work around this issue by clflushing the corresponding CPU cacheline
> > > > > > following any store of the seqno and preceding any reading of it. When
> > > > > > reading it do this only when the caller expects a coherent view.
> > > > > >
> > > > > > Testcase: igt/store_dword_loop_render
> > > > > > Signed-off-by: Imre Deak <imre.deak at intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/intel_lrc.c        | 17 +++++++++++++++++
> > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++++++
> > > > > >  2 files changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > > > > > index 9f5485d..88bc5525 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > > > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
> > > > > >  
> > > > > >  static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> > > > > >  {
> > > > > > +	/*
> > > > > > +	 * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM
> > > > > > +	 * storing the completed request's seqno occasionally doesn't
> > > > > > +	 * invalidate the CPU cache. Work around this by clflushing the
> > > > > > +	 * corresponding cacheline whenever the caller wants the coherency to
> > > > > > +	 * be guaranteed. Note that this cacheline is known to be
> > > > > > +	 * clean at this point, since we only write it in gen8_set_seqno(),
> > > > > > +	 * where we also do a clflush after the write. So this clflush in
> > > > > > +	 * practice becomes an invalidate operation.
> > > 
> > > Did you compare and contrast with the gen6+ w/a? A clflush may just work
> > > out quicker considering that the posting read would involve a spinlock
> > > and fw dance.
> > 
> > Actually, I did, but only saw that it only works, didn't benchmark it.
> > I'd also think that clflush would be faster, since it's only a cache
> > invalidate at this point. But I will compare the two things now.
> 
> If an mmio read fixes it then it doesn't feel like a snoop problem after
> all.

Ok, I retract what I just said. I tried now and with the patch below and
still see the problem. I must have remembered the testcase where I
created a separate GTT mapping for the status page and read the seqno
for that. Sorry for the confusion.

diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 9f5485d..36e5fd6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1288,6 +1288,21 @@ static int gen8_emit_flush_render(struct
intel_ringbuffer *ringbuf,
 
 static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool
lazy_coherency)
 {
+	if (!lazy_coherency) {
+		struct drm_i915_private *dev_priv = ring->dev->dev_private;
+		POSTING_READ(RING_ACTHD(ring->mmio_base));
+	}
+
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
 }