[Intel-gfx] [PATCH 2/3] drm/i915/execlists: Suppress preempting self

Wed Jan 23 16:40:44 UTC 2019

On 23/01/2019 14:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-23 14:08:56)
>>
>> On 23/01/2019 12:36, Chris Wilson wrote:
>>> In order to avoid preempting ourselves, we currently refuse to schedule
>>> the tasklet if we reschedule an inflight context. However, this glosses
>>> over a few issues such as what happens after a CS completion event and
>>> we then preempt the newly executing context with itself, or if something
>>> else causes a tasklet_schedule triggering the same evaluation to
>>> preempt the active context with itself.
>>>
>>> To avoid the extra complications, after deciding that we have
>>> potentially queued a request with higher priority than the currently
>>> executing request, inspect the head of the queue to see if it is indeed
>>> higher priority from another context.
>>>
>>> References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_scheduler.c | 20 ++++++--
>>>    drivers/gpu/drm/i915/intel_lrc.c      | 67 ++++++++++++++++++++++++---
>>>    2 files changed, 76 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
>>> index 340faea6c08a..fb5d953430e5 100644
>>> --- a/drivers/gpu/drm/i915/i915_scheduler.c
>>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
>>> @@ -239,6 +239,18 @@ sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
>>>        return engine;
>>>    }
>>>    
>>> +static bool inflight(const struct i915_request *rq,
>>> +                  const struct intel_engine_cs *engine)
>>> +{
>>> +     const struct i915_request *active;
>>> +
>>> +     if (!rq->global_seqno)
>>> +             return false;
>>> +
>>> +     active = port_request(engine->execlists.port);
>>> +     return active->hw_context == rq->hw_context;
>>> +}
>>> +
>>>    static void __i915_schedule(struct i915_request *rq,
>>>                            const struct i915_sched_attr *attr)
>>>    {
>>> @@ -328,6 +340,7 @@ static void __i915_schedule(struct i915_request *rq,
>>>                INIT_LIST_HEAD(&dep->dfs_link);
>>>    
>>>                engine = sched_lock_engine(node, engine);
>>> +             lockdep_assert_held(&engine->timeline.lock);
>>>    
>>>                /* Recheck after acquiring the engine->timeline.lock */
>>>                if (prio <= node->attr.priority || node_signaled(node))
>>> @@ -356,17 +369,16 @@ static void __i915_schedule(struct i915_request *rq,
>>>                if (prio <= engine->execlists.queue_priority)
>>>                        continue;
>>>    
>>> +             engine->execlists.queue_priority = prio;
>>> +
>>>                /*
>>>                 * If we are already the currently executing context, don't
>>>                 * bother evaluating if we should preempt ourselves.
>>>                 */
>>> -             if (node_to_request(node)->global_seqno &&
>>> -                 i915_seqno_passed(port_request(engine->execlists.port)->global_seqno,
>>> -                                   node_to_request(node)->global_seqno))
>>> +             if (inflight(node_to_request(node), engine))
>>>                        continue;
>>>    
>>>                /* Defer (tasklet) submission until after all of our updates. */
>>> -             engine->execlists.queue_priority = prio;
>>>                tasklet_hi_schedule(&engine->execlists.tasklet);
>>>        }
>>>    
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 8aa8a4862543..d9d744f6ab2c 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -182,12 +182,64 @@ static inline int rq_prio(const struct i915_request *rq)
>>>    }
>>>    
>>>    static inline bool need_preempt(const struct intel_engine_cs *engine,
>>> -                             const struct i915_request *last,
>>> -                             int prio)
>>> +                             const struct i915_request *rq,
>>> +                             int q_prio)
>>>    {
>>> -     return (intel_engine_has_preemption(engine) &&
>>> -             __execlists_need_preempt(prio, rq_prio(last)) &&
>>> -             !i915_request_completed(last));
>>> +     const struct intel_context *ctx = rq->hw_context;
>>> +     const int last_prio = rq_prio(rq);
>>> +     struct rb_node *rb;
>>> +     int idx;
>>> +
>>> +     if (!intel_engine_has_preemption(engine))
>>> +             return false;
>>> +
>>> +     if (i915_request_completed(rq))
>>> +             return false;
>>> +
>>> +     if (!__execlists_need_preempt(q_prio, last_prio))
>>> +             return false;
>>> +
>>> +     /*
>>> +      * The queue_priority is a mere hint that we may need to preempt.
>>> +      * If that hint is stale or we may be trying to preempt ourselves,
>>> +      * ignore the request.
>>> +      */
>>> +
>>> +     list_for_each_entry_continue(rq, &engine->timeline.requests, link) {
>>> +             GEM_BUG_ON(rq->hw_context == ctx);
>>
>> Why would there be no more requests belonging to the same context on the
>> engine timeline after the first one? Did you mean "if (rq->hw_context ==
>> ctx) continue;" ?
> 
> We enter the function with rq == execlist->port, i.e. the last request
> submitted to ELSP[0]. In this loop, we iterate from the start of ELSP[1]
> and all the request here belong to that context. It is illegal for
> ELSP[0].lrca == ELSP[1].lrca, i.e. the context must be different.

Yes, you are right yet again. I missed the fact it is guaranteed this is 
called with port0. I wonder if function signature should change to make 
this obvious so someone doesn't get the idea to call it with any old 
request.

> 
>>
>>> +             if (rq_prio(rq) > last_prio)
>>> +                     return true;
>>> +     }
>>> +
>>> +     rb = rb_first_cached(&engine->execlists.queue);
>>> +     if (!rb)
>>> +             return false;
>>> +
>>> +     priolist_for_each_request(rq, to_priolist(rb), idx)
>>> +             return rq->hw_context != ctx && rq_prio(rq) > last_prio;
>>
>> This isn't equivalent to top of the queue priority
>> (engine->execlists.queue_priority)? Apart from the different ctx check.
>> So I guess it is easier than storing new engine->execlists.queue_top_ctx
>> and wondering about the validity of that pointer.
> 
> The problem being that queue_priority may not always match the top of
> the execlists->queue. Right, the first attempt I tried was to store the
> queue_context matching the queue_priority, but due to the suppression of
> inflight preemptions, it doesn't match for long, and is not as accurate
> as one would like across CS events.
> 
> priolist_for_each_request() isn't too horrible for finding the first
> pointer. I noted that we teach it to do: for(idx = __ffs(p->used); ...)
> though. If we didn't care about checking hw_context, we can compute the
> prio from (p->priority+1)<<SHIFT - ffs(p->used).

And this check is definitely needed to avoid some issue? I'll need to 
have another try of understanding the commit and code paths fully 
tomorrow. I mean, only because it would be good to have something more 
elegant that full blown tree lookup.

Regards,

Tvrtko