[Intel-gfx] [PATCH 12/37] drm/i915: Priority boost for new clients

Mon Jul 2 10:19:02 UTC 2018

On 29/06/2018 12:10, Tvrtko Ursulin wrote:
> 
> On 29/06/2018 11:51, Chris Wilson wrote:
>> Quoting Tvrtko Ursulin (2018-06-29 11:36:50)
>>>
>>> On 29/06/2018 11:09, Chris Wilson wrote:
>>>> Quoting Tvrtko Ursulin (2018-06-29 11:04:36)
>>>>>
>>>>> On 29/06/2018 08:53, Chris Wilson wrote:
>>>>>> Taken from an idea used for FQ_CODEL, we give new request flows a
>>>>>> priority boost. These flows are likely to correspond with interactive
>>>>>> tasks and so be more latency sensitive than the long queues. As 
>>>>>> soon as
>>>>>> the client has more than one request in the queue, further 
>>>>>> requests are
>>>>>> not boosted and it settles down into ordinary steady state behaviour.
>>>>>> Such small kicks dramatically help combat the starvation issue, by
>>>>>> allowing each client the opportunity to run even when the system is
>>>>>> under heavy throughput load (within the constraints of the user
>>>>>> selected priority).
>>>>>
>>>>> Any effect on non-micro benchmarks to mention in the commit message as
>>>>> the selling point?
>>>>
>>>> Desktop interactivity, subjective.
>>>> wsim showed a major impact
>>>>>> Testcase: igt/benchmarks/rrul
>>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++--
>>>>>>     drivers/gpu/drm/i915/i915_scheduler.h |  4 +++-
>>>>>>     2 files changed, 17 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>>>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>>>> index 14bf0be6f994..2d7a785dd88c 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>>>> @@ -1052,8 +1052,20 @@ void i915_request_add(struct i915_request 
>>>>>> *request)
>>>>>>          */
>>>>>>         local_bh_disable();
>>>>>>         rcu_read_lock(); /* RCU serialisation for set-wedged 
>>>>>> protection */
>>>>>> -     if (engine->schedule)
>>>>>> -             engine->schedule(request, 
>>>>>> &request->gem_context->sched);
>>>>>> +     if (engine->schedule) {
>>>>>> +             struct i915_sched_attr attr = 
>>>>>> request->gem_context->sched;
>>>>>> +
>>>>>> +             /*
>>>>>> +              * Boost priorities to new clients (new request flows).
>>>>>> +              *
>>>>>> +              * Allow interactive/synchronous clients to jump 
>>>>>> ahead of
>>>>>> +              * the bulk clients. (FQ_CODEL)
>>>>>> +              */
>>>>>> +             if (!prev || i915_request_completed(prev))
>>>>>> +                     attr.priority |= I915_PRIORITY_NEWCLIENT;
>>>>>
>>>>> Now a "lucky" client can always get higher priority an keep preempting
>>>>> everyone else by just timing it's submissions right. So I have big
>>>>> reservations on this one.
>>>>
>>>> Lucky being someone who is _idle_, everyone else being steady state. 
>>>> You
>>>> can't keep getting lucky and stealing the show.
>>>
>>> Why couldn't it? All it is needed is to send a new execbuf after the
>>> previous has completed.
>>>
>>> 1. First ctx A eb -> priority boost
>>> 2. Other people get back in and start executing
>>> 3. Another ctx A eb -> first has completed -> another priority boost ->
>>> work from 2) is preempted
>>> 4. Goto 2.
>>
>> So you have one client spinning, it's going to win most races and starve
>> the system, simply by owning struct_mutex. We give the other starving
>> steady-state clients an opportunity to submit ahead of the spinner when
>> they come to resubmit.
> 
> What do you mean by spinning and owning struct_mutex?
> 
> I was thinking for example the client A sending a 1ms batch and client B 
> executing a 10ms batch.
> 
> If client A keeps submitting its batch at 1ms intervals when will client 
> B complete?

I was thinking over the weekend that a scheme where we check that at 
least something from a different timeline has been completed on an 
engine might be acceptable. Which effectively may be just checking that 
a greater global seqno than the previous request on this timeline has 
been completed.

But then there is a problem with the virtual engine which would make the 
test too strict - it could miss to apply the new client boost depending 
on which physical engine the request got assigned to. Unless we take the 
position that too strict is better than too lax.

Secondary problem is about the dependency chains. Just because 
client/request is classified as new on an engine, doesn't mean it is OK 
to boost all its dependencies.

But applying the "something executed since its last submission = new 
client" which would look at the whole dependency chain feels perhaps to 
complicated. It would require a pass over the dep list and checking with 
this criteria on all involved engines before new client boost could be 
applied.

Thoughts?

Regards,

Tvrtko