[Intel-gfx] [PATCH] drm/i915: Use exponential backoff for wait_for()

Tue Nov 21 17:11:29 UTC 2017

Quoting Tvrtko Ursulin (2017-11-21 17:00:00)
> 
> On 21/11/2017 15:24, Chris Wilson wrote:
> > Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> > start with a small sleep and exponentially increase the sleep on each
> > cycle.
> > 
> > A good example of a beneficiary is the guc mmio communication channel.
> > Typically we expect (and so spin) for 10us for a quick response, but this
> > doesn't cover everything and so sometimes we fallback to the millisecond+
> > sleep. This incurs a significant delay in time-critical operations like
> > preemption (igt/gem_exec_latency), which can be improved significantly by
> > using a small sleep after the spin fails.
> > 
> > We've made this suggestion many times, but had little experimental data
> > to support adding the complexity.
> > 
> > References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: John Harrison <John.C.Harrison at intel.com>
> > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > Cc: Ville Syrjala <ville.syrjala at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 69aab324aaa1..c1ea9a009eb4 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -50,6 +50,7 @@
> >    */
> >   #define _wait_for(COND, US, W) ({ \
> >       unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> > +     long wait__ = 1;                                                \
> >       int ret__;                                                      \
> >       might_sleep();                                                  \
> >       for (;;) {                                                      \
> > @@ -62,7 +63,9 @@
> >                       ret__ = -ETIMEDOUT;                             \
> >                       break;                                          \
> >               }                                                       \
> > -             usleep_range((W), (W) * 2);                             \
> > +             usleep_range(wait__, wait__ * 2);                       \
> > +             if (wait__ < (W))                                       \
> > +                     wait__ <<= 1;                                   \
> >       }                                                               \
> >       ret__;                                                          \
> >   })
> > 
> 
> I would start the period at 10us since a) <10us is not recommended for 
> usleep family, b) most callers specify ms timeouts so <10us poll is 
> perhaps an overkill.

Don't forget the majority of the callers are now via wait_for_register
and so have that 10us poll prior to the sleeping wait_for().

If there's an ARCH_USLEEP_MIN_WHATNOT that would be useful.

> Latency sensitive callers like __intel_wait_for_register_us can be 
> tweaked at the call site to provide what they want.
> 
> For the actual guc mmio send it sounds like it should pass in 20us to 
> __intel_wait_for_register_us (referring to John's explanation email) to 
> cover 99% of the cases. And then the remaining 1% could be fine with a 
> 10us delay?

That it fixed that was a side-effect ;) It just happened to be something
that I could measure the latency of in userspace. I'd rather we have
something generic that does have a demonstrable improvement.

> Otherwise we are effectively making _wait_for partially busy looping, or 
> whatever the inefficiency in <10us usleep is. I mean, it makes no 
> practical difference to make a handful of quick loops there but it feels 
> a bit inelegant.

We already do use the hybrid busy-looping for wait_for_register (and
everybody is meant to be using wait_for_register, the exceptions should
be rare and well justified). The purpose of referencing 1758b90e38f5
("drm/i915: Use a hybrid scheme for fast register waits") was to remind
ourselves of the scheme and its benefits.
-Chris