[Intel-gfx] [PATCH 1/5] drm/i915: Expose the busyspin durations for i915_wait_request

Thu Aug 2 14:29:09 UTC 2018

Quoting Tvrtko Ursulin (2018-08-02 15:15:35)
> 
> On 28/07/2018 17:46, Chris Wilson wrote:
> > An interesting discussion regarding "hybrid interrupt polling" for NVMe
> > came to the conclusion that the ideal busyspin before sleeping was half
> > of the expected request latency (and better if it was already halfway
> > through that request). This suggested that we too should look again at
> > our tradeoff between spinning and waiting. Currently, our spin simply
> > tries to hide the cost of enabling the interrupt, which is good to avoid
> > penalising nop requests (i.e. test throughput) and not much else.
> > Studying real world workloads suggests that a spin of upto 500us can
> > dramatically boost performance, but the suggestion is that this is not
> > from avoiding interrupt latency per-se, but from secondary effects of
> > sleeping such as allowing the CPU reduce cstate and context switch away.
> > 
> > In a truly hybrid interrupt polling scheme, we would aim to sleep until
> > just before the request completed and then wake up in advance of the
> > interrupt and do a quick poll to handle completion. This is tricky for
> > ourselves at the moment as we are not recording request times, and since
> > we allow preemption, our requests are not on as a nicely ordered
> > timeline as IO. However, the idea is interesting, for it will certainly
> > help us decide when busyspinning is worthwhile.
> > 
> > v2: Expose the spin setting via Kconfig options for easier adjustment
> > and testing.
> > v3: Don't get caught sneaking in a change to the busyspin parameters.
> > v4: Explain more about the "hybrid interrupt polling" scheme that we
> > want to migrate towards.
> > 
> > Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
> > References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Sagar Kamble <sagar.a.kamble at intel.com>
> > Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > Reviewed-by: Sagar Kamble <sagar.a.kamble at intel.com>
> > ---
> >   drivers/gpu/drm/i915/Kconfig         |  6 +++++
> >   drivers/gpu/drm/i915/Kconfig.profile | 26 +++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c  | 39 +++++++++++++++++++++++++---
> >   3 files changed, 67 insertions(+), 4 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/Kconfig.profile
> > 
> > diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> > index 5c607f2c707b..387613f29cb0 100644
> > --- a/drivers/gpu/drm/i915/Kconfig
> > +++ b/drivers/gpu/drm/i915/Kconfig
> > @@ -132,3 +132,9 @@ depends on DRM_I915
> >   depends on EXPERT
> >   source drivers/gpu/drm/i915/Kconfig.debug
> >   endmenu
> > +
> > +menu "drm/i915 Profile Guided Optimisation"
> 
> Or something like a more generic drm/i915 Manual Tuning Options so it 
> sounds less like an automated thing where you can feed the results of 
> something straight in?

Hah, good called. PGO was taken right out of the gcc manual, so yeah
it's a bit misleading.

> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> We could add PMU metrics to count hit/missed busy spins so profile 
> guided bit becomes more direct?

Sensible. I wonder though how much we can do already with kprobes?

Another one I think I'd like here is the DMA_LATENCY qos parameter. If
you haven't seen https://patchwork.freedesktop.org/patch/241571/ that's
worth throwing against media-bench.
-Chris