[Intel-gfx] [PATCH 3/6] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy

Fri Jan 15 06:35:42 PST 2016

In legacy mode, we use the gen6 seqno barrier to insert a delay after
the interrupt before reading the seqno (as the seqno write is not
flushed before the interrupt is sent, the interrupt arrives before the
seqno is visible). Execlists ignored the evidence of igt.

Note that is harder, but not impossible, to reproduce the missed
interrupt syndrome with execlists. This is primarily because execlists
itself being interrupt driven helps mask the issue.

v2: Rebase and unsquash!

I kept it as gen6_seqno_barrier() for I have a long term plan to merge
the two implementations.

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala at intel.com
---
 drivers/gpu/drm/i915/intel_lrc.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3ed4ab7f571e..b8bd77bc109e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1791,18 +1791,28 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
 }
 
-static void bxt_seqno_barrier(struct intel_engine_cs *ring)
+static void
+gen6_seqno_barrier(struct intel_engine_cs *ring)
 {
-	/*
-	 * On BXT A steppings there is a HW coherency issue whereby the
+	/* Workaround to force correct ordering between irq and seqno writes on
+	 * ivb (and maybe also on snb) by reading from a CS register (like
+	 * ACTHD) before reading the status page.
+	 *
+	 * Note that this effectively effectively stalls the read by the time
+	 * it takes to do a memory transaction, which more or less ensures
+	 * that the write from the GPU has sufficient time to invalidate
+	 * the CPU cacheline. Alternatively we could delay the interrupt from
+	 * the CS ring to give the write time to land, but that would incur
+	 * a delay after every batch i.e. much more frequent than a delay
+	 * when waiting for the interrupt (with the same net latency).
+	 *
+	 * Also on BXT A steppings there is a HW coherency issue whereby the
 	 * MI_STORE_DATA_IMM storing the completed request's seqno
-	 * occasionally doesn't invalidate the CPU cache. Work around this by
-	 * clflushing the corresponding cacheline whenever the caller wants
-	 * the coherency to be guaranteed. Note that this cacheline is known
-	 * to be clean at this point, since we only write it in
-	 * bxt_a_set_seqno(), where we also do a clflush after the write. So
-	 * this clflush in practice becomes an invalidate operation.
+	 * occasionally doesn't invalidate the CPU cache.
 	 */
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
 	intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
@@ -1949,12 +1959,11 @@ logical_ring_default_vfuncs(struct drm_device *dev,
 	ring->irq_get = gen8_logical_ring_get_irq;
 	ring->irq_put = gen8_logical_ring_put_irq;
 	ring->emit_bb_start = gen8_emit_bb_start;
+	ring->irq_seqno_barrier = gen6_seqno_barrier;
 	ring->get_seqno = gen8_get_seqno;
 	ring->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) {
-		ring->irq_seqno_barrier = bxt_seqno_barrier;
+	if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
 		ring->set_seqno = bxt_a_set_seqno;
-	}
 }
 
 static inline void
-- 
2.7.0.rc3