[Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

Wed Jan 9 12:06:59 UTC 2019

Quoting Tvrtko Ursulin (2019-01-09 11:56:15)
> 
> On 07/01/2019 15:29, Chris Wilson wrote:
> > In the continual quest to reduce the amount of global work required when
> > submitting requests, replace i915_retire_requests() after allocation
> > failure to retiring just our ring.
> > 
> > References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 33 +++++++++++++++++++++--------
> >   1 file changed, 24 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 1e158eb8cb97..9ba218c6029b 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >       return NOTIFY_DONE;
> >   }
> >   
> > +static noinline struct i915_request *
> > +i915_request_alloc_slow(struct intel_context *ce)
> > +{
> > +     struct intel_ring *ring = ce->ring;
> > +     struct i915_request *rq, *next;
> > +
> > +     list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
> > +             /* Ratelimit ourselves to prevent oom from malicious clients */
> > +             if (&next->ring_link == &ring->request_list) {
> 
> list_is_last(next, &ring->request_list) ?

Tried it (needs list_is_last(&next->ring_link,...)), but I slightly
preferred not implying that next was a valid request here, and keeping
the matching form to list termination.

> > +                     cond_synchronize_rcu(rq->rcustate);
> > +                     break; /* keep the last objects for the next request */
> > +             }
> > +
> > +             if (!i915_request_completed(rq))
> > +                     break;
> > +
> > +             /* Retire our old requests in the hope that we free some */
> > +             i915_request_retire(rq);
> The RCU wait against the last submitted rq is also gone. Now it only 
> sync against the next to last rq, unless there is more than two live 
> requests. Is this what you intended?

Nah, I was trying to be too smart, forgetting that we didn't walk the
entire list. The RCU wait is against to the last rq (since next is the
list head at that point, so unchanged wrt to using list_last_entry), but
we break on seeing a busy request, so no ratelimiting if you keep the GPU
busy (not quite as intended!).

> If the ring timeline has is a list of r-r-r-R-R-R (r=completed, 
> R=pending) then it looks like it will not sync on anything.
> 
> And if the list is r-r-r-r it will sync against a completed rq. Which I 
> hope is a no-op, but still, the loop logic looks potentially dodgy.
> 
> It also has a higher level vulnerability to one hog timeline starving 
> the rest I think.

Also? Other than forgetting the earlier break preventing the throtting,
what else do you see wrong with throttling along a timeline/ring?
-Chris