[Intel-gfx] [PATCH] drm/i915: Use exponential backoff for wait_for()
Chris Wilson
chris at chris-wilson.co.uk
Tue Nov 21 17:11:29 UTC 2017
Quoting Tvrtko Ursulin (2017-11-21 17:00:00)
>
> On 21/11/2017 15:24, Chris Wilson wrote:
> > Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> > start with a small sleep and exponentially increase the sleep on each
> > cycle.
> >
> > A good example of a beneficiary is the guc mmio communication channel.
> > Typically we expect (and so spin) for 10us for a quick response, but this
> > doesn't cover everything and so sometimes we fallback to the millisecond+
> > sleep. This incurs a significant delay in time-critical operations like
> > preemption (igt/gem_exec_latency), which can be improved significantly by
> > using a small sleep after the spin fails.
> >
> > We've made this suggestion many times, but had little experimental data
> > to support adding the complexity.
> >
> > References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: John Harrison <John.C.Harrison at intel.com>
> > Cc: MichaĆ Winiarski <michal.winiarski at intel.com>
> > Cc: Ville Syrjala <ville.syrjala at linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 69aab324aaa1..c1ea9a009eb4 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -50,6 +50,7 @@
> > */
> > #define _wait_for(COND, US, W) ({ \
> > unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \
> > + long wait__ = 1; \
> > int ret__; \
> > might_sleep(); \
> > for (;;) { \
> > @@ -62,7 +63,9 @@
> > ret__ = -ETIMEDOUT; \
> > break; \
> > } \
> > - usleep_range((W), (W) * 2); \
> > + usleep_range(wait__, wait__ * 2); \
> > + if (wait__ < (W)) \
> > + wait__ <<= 1; \
> > } \
> > ret__; \
> > })
> >
>
> I would start the period at 10us since a) <10us is not recommended for
> usleep family, b) most callers specify ms timeouts so <10us poll is
> perhaps an overkill.
Don't forget the majority of the callers are now via wait_for_register
and so have that 10us poll prior to the sleeping wait_for().
If there's an ARCH_USLEEP_MIN_WHATNOT that would be useful.
> Latency sensitive callers like __intel_wait_for_register_us can be
> tweaked at the call site to provide what they want.
>
> For the actual guc mmio send it sounds like it should pass in 20us to
> __intel_wait_for_register_us (referring to John's explanation email) to
> cover 99% of the cases. And then the remaining 1% could be fine with a
> 10us delay?
That it fixed that was a side-effect ;) It just happened to be something
that I could measure the latency of in userspace. I'd rather we have
something generic that does have a demonstrable improvement.
> Otherwise we are effectively making _wait_for partially busy looping, or
> whatever the inefficiency in <10us usleep is. I mean, it makes no
> practical difference to make a handful of quick loops there but it feels
> a bit inelegant.
We already do use the hybrid busy-looping for wait_for_register (and
everybody is meant to be using wait_for_register, the exceptions should
be rare and well justified). The purpose of referencing 1758b90e38f5
("drm/i915: Use a hybrid scheme for fast register waits") was to remind
ourselves of the scheme and its benefits.
-Chris
More information about the Intel-gfx
mailing list