[Intel-gfx] [PATCH] drm/i915: Use exponential backoff for wait_for()

Tue Nov 21 20:40:07 UTC 2017

Quoting Ville Syrjälä (2017-11-21 17:29:43)
> On Tue, Nov 21, 2017 at 05:11:29PM +0000, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-11-21 17:00:00)
> > > 
> > > On 21/11/2017 15:24, Chris Wilson wrote:
> > > > Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> > > > start with a small sleep and exponentially increase the sleep on each
> > > > cycle.
> > > > 
> > > > A good example of a beneficiary is the guc mmio communication channel.
> > > > Typically we expect (and so spin) for 10us for a quick response, but this
> > > > doesn't cover everything and so sometimes we fallback to the millisecond+
> > > > sleep. This incurs a significant delay in time-critical operations like
> > > > preemption (igt/gem_exec_latency), which can be improved significantly by
> > > > using a small sleep after the spin fails.
> > > > 
> > > > We've made this suggestion many times, but had little experimental data
> > > > to support adding the complexity.
> > > > 
> > > > References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > Cc: John Harrison <John.C.Harrison at intel.com>
> > > > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > > > Cc: Ville Syrjala <ville.syrjala at linux.intel.com>
> > > > ---
> > > >   drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
> > > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > > index 69aab324aaa1..c1ea9a009eb4 100644
> > > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > > @@ -50,6 +50,7 @@
> > > >    */
> > > >   #define _wait_for(COND, US, W) ({ \
> > > >       unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> > > > +     long wait__ = 1;                                                \
> > > >       int ret__;                                                      \
> > > >       might_sleep();                                                  \
> > > >       for (;;) {                                                      \
> > > > @@ -62,7 +63,9 @@
> > > >                       ret__ = -ETIMEDOUT;                             \
> > > >                       break;                                          \
> > > >               }                                                       \
> > > > -             usleep_range((W), (W) * 2);                             \
> > > > +             usleep_range(wait__, wait__ * 2);                       \
> > > > +             if (wait__ < (W))                                       \
> > > > +                     wait__ <<= 1;                                   \
> > > >       }                                                               \
> > > >       ret__;                                                          \
> > > >   })
> > > > 
> > > 
> > > I would start the period at 10us since a) <10us is not recommended for 
> > > usleep family, b) most callers specify ms timeouts so <10us poll is 
> > > perhaps an overkill.
> > 
> > Don't forget the majority of the callers are now via wait_for_register
> > and so have that 10us poll prior to the sleeping wait_for().
> > 
> > If there's an ARCH_USLEEP_MIN_WHATNOT that would be useful.
> > 
> > > Latency sensitive callers like __intel_wait_for_register_us can be 
> > > tweaked at the call site to provide what they want.
> > > 
> > > For the actual guc mmio send it sounds like it should pass in 20us to 
> > > __intel_wait_for_register_us (referring to John's explanation email) to 
> > > cover 99% of the cases. And then the remaining 1% could be fine with a 
> > > 10us delay?
> > 
> > That it fixed that was a side-effect ;) It just happened to be something
> > that I could measure the latency of in userspace. I'd rather we have
> > something generic that does have a demonstrable improvement.
> > 
> > > Otherwise we are effectively making _wait_for partially busy looping, or 
> > > whatever the inefficiency in <10us usleep is. I mean, it makes no 
> > > practical difference to make a handful of quick loops there but it feels 
> > > a bit inelegant.
> > 
> > We already do use the hybrid busy-looping for wait_for_register (and
> > everybody is meant to be using wait_for_register, the exceptions should
> > be rare and well justified).
> 
> We do have to bear 96dabe99cae8 ("drm/i915: Avoid busy-spinning on
> VLV_GLTC_PW_STATUS mmio") in mind. Touching wait_for() might break
> that again. Probably the correct choice for these exceptions would
> be to specify a minimum polling interval explicitly.

Fortunately, that was triggered by BAT:

https://bugs.freedesktop.org/show_bug.cgi?id=100718#c4
> Close to 100% reproducing rate on BYT-n2820 and BSW-n3050 with
> DRM-Tip:
> igt at pm_rpm@gem-execbuf-stress
> igt at pm_rpm@gem-execbuf-stress-extra-wait

Hopefully, the timing hasn't changed significantly since then and we can
rely on current CI to trip over it if the usleep interval is too small.

Thanks for the reminder!

> The other cases that come to mind are the wait_for_pipe_off() type of
> things which generally measure several milliseconds. Faster polling
> won't really help those cases that much except if the pipe already
> happens to be close to vblank when we shut it down. But as long as
> the max polling interval gets capped to a few milliseconds I guess
> the exponential backoff shouldn't hurt us either.

Indeed. Starting at 10us, it'll take us 8 cycles (total of 2.5ms) to hit
the timeout, so 7 more loops than required. Likely in the noise, but
(Wmin, Wmax) is an easy improvement.
-Chris