[Intel-gfx] [PATCH v3] drm/i915: Use exponential backoff for wait_for()
Michał Winiarski
michal.winiarski at intel.com
Fri Nov 24 12:37:56 UTC 2017
On Tue, Nov 21, 2017 at 08:59:51PM +0000, Chris Wilson wrote:
> Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> start with a small sleep and exponentially increase the sleep on each
> cycle.
>
> A good example of a beneficiary is the guc mmio communication channel.
> Typically we expect (and so spin) for 10us for a quick response, but this
> doesn't cover everything and so sometimes we fallback to the millisecond+
> sleep. This incurs a significant delay in time-critical operations like
> preemption (igt/gem_exec_latency), which can be improved significantly by
> using a small sleep after the spin fails.
>
> We've made this suggestion many times, but had little experimental data
> to support adding the complexity.
>
> v2: Bump the minimum usleep to 10us on advice of
> Documentation/timers/timers-howto.txt (Tvrko)
> v3: Specify min, max range for usleep intervals -- some code may
> crucially depend upon and so want to specify the sleep pattern.
Since we see the effects for GuC preeption, let's gather some evidence.
(SKL)
intel_guc_send_mmio latency: 100 rounds of gem_exec_latency --r '*-preemption'
drm-tip:
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 44 | |
16 -> 31 : 1088 | |
32 -> 63 : 832 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 12 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 29899 |********* |
2048 -> 4095 : 131033 |****************************************|
exponential backoff:
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 58 | |
16 -> 31 : 2009 | |
32 -> 63 : 153615 |****************************************|
64 -> 127 : 4360 |* |
128 -> 255 : 14 | |
256 -> 511 : 4236 |* |
512 -> 1023 : 4 | |
s/10us/100us in __intel_wait_for_register_fw fast_timeout (without exponential
backoff):
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 946 | |
16 -> 31 : 159803 |****************************************|
32 -> 63 : 3440 | |
64 -> 127 : 33 | |
128 -> 255 : 1 | |
256 -> 511 : 74 | |
same as the previous one, except with lower GPU clock, let's say max == hw_min:
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 26 | |
32 -> 63 : 160411 |****************************************|
64 -> 127 : 3402 | |
128 -> 255 : 12 | |
256 -> 511 : 50 | |
512 -> 1023 : 3 | |
1024 -> 2047 : 1 | |
2048 -> 4095 : 2 | |
We're waaay better with exponential backoff compared with current drm-tip, and
we're almost as good as with spinning for a longer period. Unfortunately with
GuC the delay depends on platform (and/or GPU clock?), so we're still seeing the
long tail without exponential backoff if we manage to miss the fast timeout.
> References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: John Harrison <John.C.Harrison at intel.com>
> Cc: Michał Winiarski <michal.winiarski at intel.com>
> Cc: Ville Syrjala <ville.syrjala at linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski at intel.com>
-Michał
> ---
> drivers/gpu/drm/i915/intel_drv.h | 11 +++++++----
> drivers/gpu/drm/i915/intel_pm.c | 2 +-
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 635a96fcd788..c00441a3d649 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -48,8 +48,9 @@
> * having timed out, since the timeout could be due to preemption or similar and
> * we've never had a chance to check the condition before the timeout.
> */
> -#define _wait_for(COND, US, W) ({ \
> +#define _wait_for(COND, US, Wmin, Wmax) ({ \
> unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \
> + long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \
> int ret__; \
> might_sleep(); \
> for (;;) { \
> @@ -62,12 +63,14 @@
> ret__ = -ETIMEDOUT; \
> break; \
> } \
> - usleep_range((W), (W) * 2); \
> + usleep_range(wait__, wait__ * 2); \
> + if (wait__ < (Wmax)) \
> + wait__ <<= 1; \
> } \
> ret__; \
> })
>
> -#define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 1000)
> +#define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 10, 1000)
>
> /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
> #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> @@ -116,7 +119,7 @@
> int ret__; \
> BUILD_BUG_ON(!__builtin_constant_p(US)); \
> if ((US) > 10) \
> - ret__ = _wait_for((COND), (US), 10); \
> + ret__ = _wait_for((COND), (US), 10, 10); \
> else \
> ret__ = _wait_for_atomic((COND), (US), 0); \
> ret__; \
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index e445ec174831..f07f14ae198d 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -9294,7 +9294,7 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
> ret = 0;
> goto out;
> }
> - ret = _wait_for(COND, timeout_base_ms * 1000, 10);
> + ret = _wait_for(COND, timeout_base_ms * 1000, 10, 10);
> if (!ret)
> goto out;
>
> --
> 2.15.0
>
More information about the Intel-gfx
mailing list