[Intel-gfx] [PATCH] drm/i915: Increase the busyspin durations for i915_wait_request
Kamble, Sagar A
sagar.a.kamble at intel.com
Fri Sep 15 09:15:31 UTC 2017
Thanks Chris. LGTM.
Minor inputs below
On 9/14/2017 3:28 PM, Chris Wilson wrote:
> An interesting discussion regarding "hybrid interrupt polling" for NVMe
> came to the conclusion that the ideal busyspin before sleeping was half
> of the expected request latency (and better if it was already halfway
> through that request). This suggested that we too should look again at
> our tradeoff between spinning and waiting. Currently, our spin simply
> tries to hide the cost of enabling the interrupt, which is good to avoid
> penalising nop requests (i.e. test throughput) and not much else.
> Studying real world workloads suggests that a spin of upto 500us can
> dramatically boost performance, but the suggestion is that this is not
> from avoiding interrupt latency per-se, but from secondary effects of
> sleeping such as allowing the CPU reduce cstate and context switch away.
> To offset those costs from penalising the active client, bump the initial
> spin somewhat to 250us and the secondary spin to 20us to balance the cost
> of another context switch following the interrupt.
>
> Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Sagar Kamble <sagar.a.kamble at intel.com>
> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Ben Widawsky <ben at bwidawsk.net>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_request.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 813a3b546d6e..ccbdaf6a0e4d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -1155,8 +1155,20 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
>
> - /* Optimistic short spin before touching IRQs */
> - if (i915_spin_request(req, state, 5))
> + /* Optimistic short spin before touching IRQs.
> + *
> + * We use a rather large value here to offset the penalty of switching
> + * away from the active task. Frequently, the client will wait upon
> + * an old swapbuffer to throttle itself to remain within a frame of
> + * the gpu. If the client is running in lockstep with the gpu, then
> + * it should not be waiting long at all, and a sleep now will incur
> + * extra scheduler latency in producing the next frame. So we sleep
Typo? Should this be "So we spin for longer"
> + * for longer to try and keep the client running.
> + *
> + * We need ~5us to enable the irq, ~20us to hide a context switch,
> + * we use 250us to keep the cache hot.
> + */
> + if (i915_spin_request(req, state, 250))
Hybrid polling introduces sleep of 1/2 request duration prior to poll of
1/2 request duration.
Should we reorder here to achieve the same?
> goto complete;
>
> set_current_state(state);
> @@ -1212,8 +1224,13 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> __i915_wait_request_check_and_reset(req))
> continue;
>
> - /* Only spin if we know the GPU is processing this request */
> - if (i915_spin_request(req, state, 2))
> + /*
> + * A quick spin now we are on the CPU to offset the cost of
> + * context switching away (and so spin for roughly the same as
> + * the scheduler latency). We only spin if we know the GPU is
> + * processing this request, and so likely to finish shortly.
> + */
> + if (i915_spin_request(req, state, 20))
> break;
>
> if (!intel_wait_check_request(&wait, req)) {
More information about the Intel-gfx
mailing list