[Intel-gfx] [PATCH] drm/i915: Use exponential backoff for wait_for()
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Nov 21 17:00:00 UTC 2017
On 21/11/2017 15:24, Chris Wilson wrote:
> Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> start with a small sleep and exponentially increase the sleep on each
> cycle.
>
> A good example of a beneficiary is the guc mmio communication channel.
> Typically we expect (and so spin) for 10us for a quick response, but this
> doesn't cover everything and so sometimes we fallback to the millisecond+
> sleep. This incurs a significant delay in time-critical operations like
> preemption (igt/gem_exec_latency), which can be improved significantly by
> using a small sleep after the spin fails.
>
> We've made this suggestion many times, but had little experimental data
> to support adding the complexity.
>
> References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: John Harrison <John.C.Harrison at intel.com>
> Cc: Michał Winiarski <michal.winiarski at intel.com>
> Cc: Ville Syrjala <ville.syrjala at linux.intel.com>
> ---
> drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 69aab324aaa1..c1ea9a009eb4 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -50,6 +50,7 @@
> */
> #define _wait_for(COND, US, W) ({ \
> unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \
> + long wait__ = 1; \
> int ret__; \
> might_sleep(); \
> for (;;) { \
> @@ -62,7 +63,9 @@
> ret__ = -ETIMEDOUT; \
> break; \
> } \
> - usleep_range((W), (W) * 2); \
> + usleep_range(wait__, wait__ * 2); \
> + if (wait__ < (W)) \
> + wait__ <<= 1; \
> } \
> ret__; \
> })
>
I would start the period at 10us since a) <10us is not recommended for
usleep family, b) most callers specify ms timeouts so <10us poll is
perhaps an overkill.
Latency sensitive callers like __intel_wait_for_register_us can be
tweaked at the call site to provide what they want.
For the actual guc mmio send it sounds like it should pass in 20us to
__intel_wait_for_register_us (referring to John's explanation email) to
cover 99% of the cases. And then the remaining 1% could be fine with a
10us delay?
Otherwise we are effectively making _wait_for partially busy looping, or
whatever the inefficiency in <10us usleep is. I mean, it makes no
practical difference to make a handful of quick loops there but it feels
a bit inelegant.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list