[Intel-gfx] [PATCH v3 1/2] drm/i915: Expose the busyspin durations for i915_wait_request

Mon Nov 27 09:39:44 UTC 2017

Quoting Sagar Arun Kamble (2017-11-27 07:20:01)
> 
> 
> On 11/26/2017 5:50 PM, Chris Wilson wrote:
> > An interesting discussion regarding "hybrid interrupt polling" for NVMe
> > came to the conclusion that the ideal busyspin before sleeping
> 
> I think hybrid approach suggests sleep (1/2 duration) and then busyspin (1/2 duration)
> (for small I/O size), so this should be "busyspin after sleeping" although we are not doing exactly same.

It does, we are not. For reasons I thought I had described ... But not
here apparently. Differences between hybrid interrupt polling and
ourselves we should include in the comments as well.

> >   was half
> > of the expected request latency (and better if it was already halfway
> > through that request). This suggested that we too should look again at
> > our tradeoff between spinning and waiting. Currently, our spin simply
> > tries to hide the cost of enabling the interrupt, which is good to avoid
> > penalising nop requests (i.e. test throughput) and not much else.
> > Studying real world workloads suggests that a spin of upto 500us can
> > dramatically boost performance, but the suggestion is that this is not
> > from avoiding interrupt latency per-se, but from secondary effects of
> > sleeping such as allowing the CPU reduce cstate and context switch away.
> >
> > v2: Expose the spin setting via Kconfig options for easier adjustment
> > and testing.
> > v3: Don't get caught sneaking in a change to the busyspin parameters.
> >
> > Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Sagar Kamble <sagar.a.kamble at intel.com>
> > Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > ---
> >   drivers/gpu/drm/i915/Kconfig            |  6 ++++++
> >   drivers/gpu/drm/i915/Kconfig.profile    | 23 +++++++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_gem_request.c | 28 ++++++++++++++++++++++++----
> >   3 files changed, 53 insertions(+), 4 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/Kconfig.profile
> >
> > diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> > index dfd95889f4b7..eae90783f8f9 100644
> > --- a/drivers/gpu/drm/i915/Kconfig
> > +++ b/drivers/gpu/drm/i915/Kconfig
> > @@ -131,3 +131,9 @@ depends on DRM_I915
> >   depends on EXPERT
> >   source drivers/gpu/drm/i915/Kconfig.debug
> >   endmenu
> > +
> > +menu "drm/i915 Profile Guided Optimisation"
> > +     visible if EXPERT
> > +     depends on DRM_I915
> > +     source drivers/gpu/drm/i915/Kconfig.profile
> > +endmenu
> > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > new file mode 100644
> > index 000000000000..a1aed0e2aad5
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -0,0 +1,23 @@
> > +config DRM_I915_SPIN_REQUEST_IRQ
> > +     int
> > +     default 5 # microseconds
> > +     help
> > +       Before sleeping waiting for a request (GPU operation) to complete,
> > +       we may spend some time polling for its completion. As the IRQ may
> > +       take a non-negligible time to setup, we do a short spin first to
> > +       check if the request will complete quickly.
> > +
> > +       May be 0 to disable the initial spin.
> > +
> > +config DRM_I915_SPIN_REQUEST_CS
> > +     int
> > +     default 2 # microseconds
> > +     help
> > +       After sleeping for a request (GPU operation) to complete, we will
> > +       be woken up on the completion of every request prior to the one
> > +       being waited on. For very short requests, going back to sleep and
> > +       be woken up again may add considerably to the wakeup latency. To
> > +       avoid incurring extra latency from the scheduler, we may choose to
> > +       spin prior to sleeping again.
> > +
> > +       May be 0 to disable spinning after being woken.
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > index a90bdd26571f..7ac72a0a949c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > @@ -1198,8 +1198,21 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> >       GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> >       GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
> >   
> > -     /* Optimistic short spin before touching IRQs */
> > -     if (__i915_spin_request(req, wait.seqno, state, 5))
> > +     /* Optimistic spin before touching IRQs.
> > +      *
> > +      * We may use a rather large value here to offset the penalty of
> > +      * switching away from the active task. Frequently, the client will
> > +      * wait upon an old swapbuffer to throttle itself to remain within a
> > +      * frame of the gpu. If the client is running in lockstep with the gpu,
> > +      * then it should not be waiting long at all, and a sleep now will incur
> > +      * extra scheduler latency in producing the next frame. So we sleep
> > +      * for longer to try and keep the client running.
> > +      *
> > +      * We need ~5us to enable the irq, ~20us to hide a context switch.
> 
> This comment fits more with next patch.

It just a general overview of the short of timescales we expect. I want
to explain the rationale behind having a spin and what spins we have in
mind. Maybe if I just add "upto"
-Chris