[Intel-gfx] [PATCH 05/41] drm/i915: Restructure priority inheritance

Tue Jan 26 11:40:24 UTC 2021

On 26/01/2021 11:30, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2021-01-26 11:12:53)
>>
>>
>> On 25/01/2021 14:01, Chris Wilson wrote:
>>> +static void ipi_schedule(struct work_struct *wrk)
>>> +{
>>> +     struct i915_sched_ipi *ipi = container_of(wrk, typeof(*ipi), work);
>>> +     struct i915_request *rq = xchg(&ipi->list, NULL);
>>> +
>>> +     do {
>>> +             struct i915_request *rn = xchg(&rq->sched.ipi_link, NULL);
>>> +             int prio;
>>> +
>>> +             prio = ipi_get_prio(rq);
>>> +
>>> +             /*
>>> +              * For cross-engine scheduling to work we rely on one of two
>>> +              * things:
>>> +              *
>>> +              * a) The requests are using dma-fence fences and so will not
>>> +              * be scheduled until the previous engine is completed, and
>>> +              * so we cannot cross back onto the original engine and end up
>>> +              * queuing an earlier request after the first (due to the
>>> +              * interrupted DFS).
>>> +              *
>>> +              * b) The requests are using semaphores and so may be already
>>> +              * be in flight, in which case if we cross back onto the same
>>> +              * engine, we will already have put the interrupted DFS into
>>> +              * the priolist, and the continuation will now be queued
>>> +              * afterwards [out-of-order]. However, since we are using
>>> +              * semaphores in this case, we also perform yield on semaphore
>>> +              * waits and so will reorder the requests back into the correct
>>> +              * sequence. This occurrence (of promoting a request chain
>>> +              * that crosses the engines using semaphores back unto itself)
>>> +              * should be unlikely enough that it probably does not matter...
>>> +              */
>>> +             local_bh_disable();
>>> +             i915_request_set_priority(rq, prio);
>>> +             local_bh_enable();
>>
>> Is it that important and wouldn't the priority order restore eventually
>> due timeslicing?
> 
> There would be a window in which we executed userspace code
> out-of-order. That's enough to scare me! However, for our PI dependency
> chains it should not matter as the only time we do submit out-of-order,
> we are stuck on _our_ semaphore that cannot be resolved until the
> requests are back in-order.

Out of order how? Within a single timeline?! I though only with 
incomplete view of priority inheritance, which in my mind could only 
cause deadlocks (if no timeslicing). But really really out of order?

> I've tried to trick this into causing problems with the
> i915_selftest/igt_schedule_cycle and gem_exec_schedule/noreorder.
> Fortunately for my sanity, neither test have caught any problems.
> 
> This is the handwaving part of removing the global lock.
> 
>>> +     /*
>>> +      * If we are setting the priority before being submitted, see if we
>>> +      * can quickly adjust our own priority in-situ and avoid taking
>>> +      * the contended engine->active.lock. If we need priority inheritance,
>>> +      * take the slow route.
>>> +      */
>>> +     if (rq_prio(rq) == I915_PRIORITY_INVALID) {
>>> +             struct i915_dependency *p;
>>> +
>>> +             rcu_read_lock();
>>> +             for_each_signaler(p, rq) {
>>> +                     struct i915_request *s =
>>> +                             container_of(p->signaler, typeof(*s), sched);
>>> +
>>> +                     if (rq_prio(s) >= prio)
>>> +                             continue;
>>> +
>>> +                     if (__i915_request_is_complete(s))
>>> +                             continue;
>>> +
>>> +                     break;
>>> +             }
>>> +             rcu_read_unlock();
>>
>> Exit this loop with a first lower priority incomplete signaler. What
>> does the block below then do? Feels like it needs a comment.
> 
> I thought I had sufficiently explained that in the comment above.
> 
> /* Update priority in place if no PI required */
>>> +             if (&p->signal_link == &rq->sched.signalers_list &&
>>> +                 cmpxchg(&rq->sched.attr.priority,
>>> +                         I915_PRIORITY_INVALID,
>>> +                         prio) == I915_PRIORITY_INVALID)
>>> +                     return;
> 
> It could do a few more tricks to change the priority in-place a second
> time, but I did not think that would be frequent enough to matter.
> Whereas we always adjust the priority from INVALID once before
> submission, and avoiding taking the lock then does make a difference to
> the profiles.

To start with, if p is NULL or un-initialized (can be, no?) then 
relationship of &p->signal_link to &rq->sched.signalers_list escapes me.

Regards,

Tvrtko