[Intel-gfx] [PATCH] drm/i915: Use exponential backoff for wait_for()

Tue Nov 21 17:00:00 UTC 2017

On 21/11/2017 15:24, Chris Wilson wrote:
> Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> start with a small sleep and exponentially increase the sleep on each
> cycle.
> 
> A good example of a beneficiary is the guc mmio communication channel.
> Typically we expect (and so spin) for 10us for a quick response, but this
> doesn't cover everything and so sometimes we fallback to the millisecond+
> sleep. This incurs a significant delay in time-critical operations like
> preemption (igt/gem_exec_latency), which can be improved significantly by
> using a small sleep after the spin fails.
> 
> We've made this suggestion many times, but had little experimental data
> to support adding the complexity.
> 
> References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: John Harrison <John.C.Harrison at intel.com>
> Cc: Michał Winiarski <michal.winiarski at intel.com>
> Cc: Ville Syrjala <ville.syrjala at linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 69aab324aaa1..c1ea9a009eb4 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -50,6 +50,7 @@
>    */
>   #define _wait_for(COND, US, W) ({ \
>   	unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;	\
> +	long wait__ = 1;						\
>   	int ret__;							\
>   	might_sleep();							\
>   	for (;;) {							\
> @@ -62,7 +63,9 @@
>   			ret__ = -ETIMEDOUT;				\
>   			break;						\
>   		}							\
> -		usleep_range((W), (W) * 2);				\
> +		usleep_range(wait__, wait__ * 2);			\
> +		if (wait__ < (W))					\
> +			wait__ <<= 1;					\
>   	}								\
>   	ret__;								\
>   })
> 

I would start the period at 10us since a) <10us is not recommended for 
usleep family, b) most callers specify ms timeouts so <10us poll is 
perhaps an overkill.

Latency sensitive callers like __intel_wait_for_register_us can be 
tweaked at the call site to provide what they want.

For the actual guc mmio send it sounds like it should pass in 20us to 
__intel_wait_for_register_us (referring to John's explanation email) to 
cover 99% of the cases. And then the remaining 1% could be fine with a 
10us delay?

Otherwise we are effectively making _wait_for partially busy looping, or 
whatever the inefficiency in <10us usleep is. I mean, it makes no 
practical difference to make a handful of quick loops there but it feels 
a bit inelegant.

Regards,

Tvrtko