[Intel-gfx] [PATCH] drm/i915: Expose the busyspin durations for i915_wait_request
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Nov 24 16:44:21 UTC 2017
On 24/11/2017 14:54, Chris Wilson wrote:
> An interesting discussion regarding "hybrid interrupt polling" for NVMe
> came to the conclusion that the ideal busyspin before sleeping was half
> of the expected request latency (and better if it was already halfway
> through that request). This suggested that we too should look again at
> our tradeoff between spinning and waiting. Currently, our spin simply
> tries to hide the cost of enabling the interrupt, which is good to avoid
> penalising nop requests (i.e. test throughput) and not much else.
> Studying real world workloads suggests that a spin of upto 500us can
> dramatically boost performance, but the suggestion is that this is not
> from avoiding interrupt latency per-se, but from secondary effects of
> sleeping such as allowing the CPU reduce cstate and context switch away.
>
> v2: Expose the spin setting via Kconfig options for easier adjustment
> and testing.
>
> Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Sagar Kamble <sagar.a.kamble at intel.com>
> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Ben Widawsky <ben at bwidawsk.net>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski at intel.com>
> ---
> drivers/gpu/drm/i915/Kconfig | 6 ++++++
> drivers/gpu/drm/i915/Kconfig.profile | 23 +++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_gem_request.c | 28 ++++++++++++++++++++++++----
> 3 files changed, 53 insertions(+), 4 deletions(-)
> create mode 100644 drivers/gpu/drm/i915/Kconfig.profile
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index dfd95889f4b7..0553c3176109 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -131,3 +131,9 @@ depends on DRM_I915
> depends on EXPERT
> source drivers/gpu/drm/i915/Kconfig.debug
> endmenu
> +
> +menu "drm/i915 Profile Guided Optimisation"
> +depends on DRM_I915
> +depends on EXPERT
> +source drivers/gpu/drm/i915/Kconfig.profile
> +endmenu
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> new file mode 100644
> index 000000000000..c8fe5754466c
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -0,0 +1,23 @@
> +config DRM_I915_SPIN_REQUEST_IRQ
> + int
> + default 5 # microseconds
> + help
> + Before sleeping waiting for a request (GPU operation) to complete,
> + we may spend some time polling for its completion. As the IRQ may
> + take a non-negligible time to setup, we do a short spin first to
> + check if the request will complete quickly.
> +
> + May be 0 to disable the initial spin.
> +
> +config DRM_I915_SPIN_REQUEST_CS
> + int
> + default 20 # microseconds
> + help
> + After sleeping for a request (GPU operation) to complete, we will
> + be woken up on the completion of every request prior to the one
> + being waited on. For very short requests, going back to sleep and
> + be woken up again may add considerably to the wakeup latency. To
> + avoid incurring extra latency from the scheduler, we may choose to
> + spin prior to sleeping again.
> +
> + May be 0 to disable spinning after being woken.
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index a90bdd26571f..7ac72a0a949c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -1198,8 +1198,21 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
>
> - /* Optimistic short spin before touching IRQs */
> - if (__i915_spin_request(req, wait.seqno, state, 5))
> + /* Optimistic spin before touching IRQs.
> + *
> + * We may use a rather large value here to offset the penalty of
> + * switching away from the active task. Frequently, the client will
> + * wait upon an old swapbuffer to throttle itself to remain within a
> + * frame of the gpu. If the client is running in lockstep with the gpu,
> + * then it should not be waiting long at all, and a sleep now will incur
> + * extra scheduler latency in producing the next frame. So we sleep
> + * for longer to try and keep the client running.
> + *
> + * We need ~5us to enable the irq, ~20us to hide a context switch.
> + */
> + if (CONFIG_DRM_I915_SPIN_REQUEST_IRQ &&
> + __i915_spin_request(req, wait.seqno, state,
> + CONFIG_DRM_I915_SPIN_REQUEST_IRQ))
> goto complete;
>
> set_current_state(state);
> @@ -1255,8 +1268,15 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> __i915_wait_request_check_and_reset(req))
> continue;
>
> - /* Only spin if we know the GPU is processing this request */
> - if (__i915_spin_request(req, wait.seqno, state, 2))
Split into "Expose" and "Increase" patches just in case and for an
obvious commit messages? Otherwise I guess its OK to expose it like
this, even under the risk of that evergreen argument of it soon
appearing in some Gentoo configuration guide. :)
Regards,
Tvrtko
> + /*
> + * A quick spin now we are on the CPU to offset the cost of
> + * context switching away (and so spin for roughly the same as
> + * the scheduler latency). We only spin if we know the GPU is
> + * processing this request, and so likely to finish shortly.
> + */
> + if (CONFIG_DRM_I915_SPIN_REQUEST_CS &&
> + __i915_spin_request(req, wait.seqno, state,
> + CONFIG_DRM_I915_SPIN_REQUEST_CS))
> break;
>
> if (!intel_wait_check_request(&wait, req)) {
>
More information about the Intel-gfx
mailing list