[PATCH 44/59] drm/i915: Increase the busyspin durations for i915_wait_request

Mon Oct 2 10:14:53 UTC 2017

An interesting discussion regarding "hybrid interrupt polling" for NVMe
came to the conclusion that the ideal busyspin before sleeping was half
of the expected request latency (and better if it was already halfway
through that request). This suggested that we too should look again at
our tradeoff between spinning and waiting. Currently, our spin simply
tries to hide the cost of enabling the interrupt, which is good to avoid
penalising nop requests (i.e. test throughput) and not much else.
Studying real world workloads suggests that a spin of upto 500us can
dramatically boost performance, but the suggestion is that this is not
from avoiding interrupt latency per-se, but from secondary effects of
sleeping such as allowing the CPU reduce cstate and context switch away.
To offset those costs from penalising the active client, bump the initial
spin somewhat to 250us and the secondary spin to 20us to balance the cost
of another context switch following the interrupt.

Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Sagar Kamble <sagar.a.kamble at intel.com>
Cc: Eero Tamminen <eero.t.tamminen at intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Cc: Ben Widawsky <ben at bwidawsk.net>
Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index b100b38f1dd2..38eab7239492 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -1174,8 +1174,20 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
 	GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
 
-	/* Optimistic short spin before touching IRQs */
-	if (__i915_spin_request(req, wait.seqno, state, 5))
+	/* Optimistic spin before touching IRQs.
+	 *
+	 * We use a rather large value here to offset the penalty of switching
+	 * away from the active task. Frequently, the client will wait upon
+	 * an old swapbuffer to throttle itself to remain within a frame of
+	 * the gpu. If the client is running in lockstep with the gpu, then
+	 * it should not be waiting long at all, and a sleep now will incur
+	 * extra scheduler latency in producing the next frame. So we sleep
+	 * for longer to try and keep the client running.
+	 *
+	 * We need ~5us to enable the irq, ~20us to hide a context switch,
+	 * we use 250us to keep the cache hot.
+	 */
+	if (__i915_spin_request(req, wait.seqno, state, 250))
 		goto complete;
 
 	set_current_state(state);
@@ -1231,8 +1243,13 @@ long i915_wait_request(struct drm_i915_gem_request *req,
 		    __i915_wait_request_check_and_reset(req))
 			continue;
 
-		/* Only spin if we know the GPU is processing this request */
-		if (__i915_spin_request(req, wait.seqno, state, 2))
+		/*
+		 * A quick spin now we are on the CPU to offset the cost of
+		 * context switching away (and so spin for roughly the same as
+		 * the scheduler latency). We only spin if we know the GPU is
+		 * processing this request, and so likely to finish shortly.
+		 */
+		if (__i915_spin_request(req, wait.seqno, state, 20))
 			break;
 
 		if (!intel_wait_check_request(&wait, req)) {
-- 
2.14.2