[Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

Wed Jan 9 12:51:53 UTC 2019

On 09/01/2019 12:06, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-09 11:56:15)
>>
>> On 07/01/2019 15:29, Chris Wilson wrote:
>>> In the continual quest to reduce the amount of global work required when
>>> submitting requests, replace i915_retire_requests() after allocation
>>> failure to retiring just our ring.
>>>
>>> References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation")
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 33 +++++++++++++++++++++--------
>>>    1 file changed, 24 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 1e158eb8cb97..9ba218c6029b 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>        return NOTIFY_DONE;
>>>    }
>>>    
>>> +static noinline struct i915_request *
>>> +i915_request_alloc_slow(struct intel_context *ce)
>>> +{
>>> +     struct intel_ring *ring = ce->ring;
>>> +     struct i915_request *rq, *next;
>>> +
>>> +     list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
>>> +             /* Ratelimit ourselves to prevent oom from malicious clients */
>>> +             if (&next->ring_link == &ring->request_list) {
>>
>> list_is_last(next, &ring->request_list) ?
> 
> Tried it (needs list_is_last(&next->ring_link,...)), but I slightly
> preferred not implying that next was a valid request here, and keeping
> the matching form to list termination.
>   
>>> +                     cond_synchronize_rcu(rq->rcustate);
>>> +                     break; /* keep the last objects for the next request */
>>> +             }
>>> +
>>> +             if (!i915_request_completed(rq))
>>> +                     break;
>>> +
>>> +             /* Retire our old requests in the hope that we free some */
>>> +             i915_request_retire(rq);
>> The RCU wait against the last submitted rq is also gone. Now it only
>> sync against the next to last rq, unless there is more than two live
>> requests. Is this what you intended?
> 
> Nah, I was trying to be too smart, forgetting that we didn't walk the
> entire list. The RCU wait is against to the last rq (since next is the
> list head at that point, so unchanged wrt to using list_last_entry), but
> we break on seeing a busy request, so no ratelimiting if you keep the GPU
> busy (not quite as intended!).
>   
>> If the ring timeline has is a list of r-r-r-R-R-R (r=completed,
>> R=pending) then it looks like it will not sync on anything.
>>
>> And if the list is r-r-r-r it will sync against a completed rq. Which I
>> hope is a no-op, but still, the loop logic looks potentially dodgy.
>>
>> It also has a higher level vulnerability to one hog timeline starving
>> the rest I think.
> 
> Also? Other than forgetting the earlier break preventing the throtting,
> what else do you see wrong with throttling along a timeline/ring?

I was on the wrong track when thinking about the removal of global 
retire. I though the hog on one timeline would be able to starve the 
other timeline, but the hog will eventually hit the allocation failure 
and sync against itself. I think it's fine.

Regards,

Tvrtko