[Intel-gfx] ✗ Fi.CI.BAT: warning for drm/i915: Move CSB MMIO reads out of the execlists lock (rev2)
Chris Wilson
chris at chris-wilson.co.uk
Tue Mar 8 10:16:32 UTC 2016
On Thu, Mar 03, 2016 at 08:50:46PM +0000, Chris Wilson wrote:
> Yes, patch is sane, I'm just messing around with my Braswell at the
> moment and then I'll try again at getting some numbers. First glance
> said 10% in reducing latency (with a 100% throughput improvement in one
> particular small copy scenario, that I want to reprdouce and do some back
> of the envelope calculations to check that it is sane), but the machine
> (thanks to execlists) just dies as soon as I try some more interesting
> benchmarks.
I rebased the patch ontop of the execlists thread so that I could
actually use the machine...
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4d005dd..5c0a4e0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -549,13 +549,10 @@ static int intel_execlists_submit(void *arg)
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
do {
- u32 status;
- u32 status_id;
- u32 submit_contexts;
- u32 status_pointer;
unsigned read_pointer, write_pointer;
-
- spin_lock(&ring->execlist_lock);
+ u32 csb[GEN8_CSB_ENTRIES][2];
+ u32 status_pointer;
+ unsigned i, read, submit_contexts;
set_current_state(TASK_INTERRUPTIBLE);
status_pointer = I915_READ_FW(RING_CONTEXT_STATUS_PTR(ring));
@@ -563,8 +560,6 @@ static int intel_execlists_submit(void *arg)
write_pointer = GEN8_CSB_WRITE_PTR(status_pointer);
if (read_pointer == write_pointer) {
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
- spin_unlock(&ring->execlist_lock);
-
if (kthread_should_stop())
return 0;
@@ -577,37 +572,41 @@ static int intel_execlists_submit(void *arg)
if (read_pointer > write_pointer)
write_pointer += GEN8_CSB_ENTRIES;
- submit_contexts = 0;
+ read = 0;
while (read_pointer < write_pointer) {
- status = get_context_status(ring, ++read_pointer, &status_id);
+ csb[read][0] = get_context_status(ring, ++read_pointer,
+ &csb[read][1]);
+ read++;
+ }
- if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED)) {
- if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
- if (execlists_check_remove_request(ring, status_id))
+ I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring),
+ _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK,
+ (write_pointer % GEN8_CSB_ENTRIES) << 8));
+
+ spin_lock(&ring->execlist_lock);
+
+ submit_contexts = 0;
+ for (i = 0; i < read; i++) {
+ if (unlikely(csb[i][0] & GEN8_CTX_STATUS_PREEMPTED)) {
+ if (csb[i][0] & GEN8_CTX_STATUS_LITE_RESTORE) {
+ if (execlists_check_remove_request(ring, csb[i][1]))
WARN(1, "Lite Restored request removed from queue\n");
} else
WARN(1, "Preemption without Lite Restore\n");
}
- if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE |
- GEN8_CTX_STATUS_ELEMENT_SWITCH))
+ if (csb[i][0] & (GEN8_CTX_STATUS_ACTIVE_IDLE |
+ GEN8_CTX_STATUS_ELEMENT_SWITCH))
submit_contexts +=
- execlists_check_remove_request(ring, status_id);
+ execlists_check_remove_request(ring, csb[i][1]);
}
if (submit_contexts) {
if (!ring->disable_lite_restore_wa ||
- (status & GEN8_CTX_STATUS_ACTIVE_IDLE))
+ (csb[i][0] & GEN8_CTX_STATUS_ACTIVE_IDLE))
execlists_context_unqueue__locked(ring);
}
-
- /* Update the read pointer to the old write pointer. Manual ringbuffer
- * management ftw </sarcasm> */
- I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring),
- _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK,
- (write_pointer % GEN8_CSB_ENTRIES) << 8));
-
spin_unlock(&ring->execlist_lock);
if (unlikely(submit_contexts > 2))
On braswell that gives improves the nop dispatch latency by 20%
gem:exec:latency:0: -0.35%
gem:exec:latency:1: +4.57%
gem:exec:latency:2: +0.07%
gem:exec:latency:4: +18.05%
gem:exec:latency:8: +26.97%
gem:exec:latency:16: +20.37%
gem:exec:latency:32: +19.91%
gem:exec:latency:64: +24.06%
gem:exec:latency:128: +23.75%
gem:exec:latency:256: +24.54%
gem:exec:latency:512: +24.30%
gem:exec:latency:1024: +24.43%
Outside of that scenario, the changes are more or less in the noise.
Even if we look at the full round-trip latency for synchronous execution.
gem:exec:nop:rcs:single: -2.68%
gem:exec:nop:rcs:continuous: -2.28%
gem:exec:nop:bcs:single: -2.31%
gem:exec:nop:bcs:continuous: +16.64%
gem:exec:nop:vcs:single: -6.24%
gem:exec:nop:vcs:continuous: +3.76%
gem:exec:nop:vecs:single: +2.56%
gem:exec:nop:vecs:continuous: +1.83%
And with any busywork on top, we lose the effect:
gem:exec:trace:Atlantis: -0.12%
gem:exec:trace:glamor: +2.08%
gem:exec:trace:glxgears: +0.79%
gem:exec:trace:OglBatch7: +0.45%
gem:exec:trace:sna: +0.57%
gem:exec:trace:unigine-valley:+0.01%
gem:exec:trace:uxa: +5.63%
I am hesistant to r-b something that cannot be tested since the relevant
igt simply explode on my machine both before and after the patch.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list