[Intel-gfx] [PATCH] drm/i915: Increase the busyspin durations for i915_wait_request

Fri Sep 15 09:23:53 UTC 2017

Quoting Kamble, Sagar A (2017-09-15 10:15:31)
> Thanks Chris. LGTM.
> 
> Minor inputs below
> 
> 
> On 9/14/2017 3:28 PM, Chris Wilson wrote:
> > An interesting discussion regarding "hybrid interrupt polling" for NVMe
> > came to the conclusion that the ideal busyspin before sleeping was half
> > of the expected request latency (and better if it was already halfway
> > through that request). This suggested that we too should look again at
> > our tradeoff between spinning and waiting. Currently, our spin simply
> > tries to hide the cost of enabling the interrupt, which is good to avoid
> > penalising nop requests (i.e. test throughput) and not much else.
> > Studying real world workloads suggests that a spin of upto 500us can
> > dramatically boost performance, but the suggestion is that this is not
> > from avoiding interrupt latency per-se, but from secondary effects of
> > sleeping such as allowing the CPU reduce cstate and context switch away.
> > To offset those costs from penalising the active client, bump the initial
> > spin somewhat to 250us and the secondary spin to 20us to balance the cost
> > of another context switch following the interrupt.
> >
> > Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Sagar Kamble <sagar.a.kamble at intel.com>
> > Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_request.c | 25 +++++++++++++++++++++----
> >   1 file changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > index 813a3b546d6e..ccbdaf6a0e4d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > @@ -1155,8 +1155,20 @@ long i915_wait_request(struct drm_i915_gem_request *req,
> >       GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> >       GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
> >   
> > -     /* Optimistic short spin before touching IRQs */
> > -     if (i915_spin_request(req, state, 5))
> > +     /* Optimistic short spin before touching IRQs.
> > +      *
> > +      * We use a rather large value here to offset the penalty of switching
> > +      * away from the active task. Frequently, the client will wait upon
> > +      * an old swapbuffer to throttle itself to remain within a frame of
> > +      * the gpu. If the client is running in lockstep with the gpu, then
> > +      * it should not be waiting long at all, and a sleep now will incur
> > +      * extra scheduler latency in producing the next frame. So we sleep
> Typo? Should this be "So we spin for longer"
> > +      * for longer to try and keep the client running.
> > +      *
> > +      * We need ~5us to enable the irq, ~20us to hide a context switch,
> > +      * we use 250us to keep the cache hot.
> > +      */
> > +     if (i915_spin_request(req, state, 250))
> Hybrid polling introduces sleep of 1/2 request duration prior to poll of 
> 1/2 request duration.
> Should we reorder here to achieve the same?

No, we are not tracking estimated time of completion for a request. We
don't know when the request was started on the hw, nor when it is likely
to complete. Now we could bookend the batch with TIMESTAMPs and with a
lot of extra work track an ewma per context per engine, but even with
some magic, I don't think that will work across preemption.

Yes, it makes a lot of sense not to spin if we don't expect the request
to complete. The closest we have at present (and currently use) is merely
a flag saying whether the request is currently executing.
-Chris