[Intel-gfx] [PATCH 2/3] drm/i915/execlists: Suppress preempting self
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jan 23 16:40:44 UTC 2019
On 23/01/2019 14:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-23 14:08:56)
>>
>> On 23/01/2019 12:36, Chris Wilson wrote:
>>> In order to avoid preempting ourselves, we currently refuse to schedule
>>> the tasklet if we reschedule an inflight context. However, this glosses
>>> over a few issues such as what happens after a CS completion event and
>>> we then preempt the newly executing context with itself, or if something
>>> else causes a tasklet_schedule triggering the same evaluation to
>>> preempt the active context with itself.
>>>
>>> To avoid the extra complications, after deciding that we have
>>> potentially queued a request with higher priority than the currently
>>> executing request, inspect the head of the queue to see if it is indeed
>>> higher priority from another context.
>>>
>>> References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_scheduler.c | 20 ++++++--
>>> drivers/gpu/drm/i915/intel_lrc.c | 67 ++++++++++++++++++++++++---
>>> 2 files changed, 76 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
>>> index 340faea6c08a..fb5d953430e5 100644
>>> --- a/drivers/gpu/drm/i915/i915_scheduler.c
>>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
>>> @@ -239,6 +239,18 @@ sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
>>> return engine;
>>> }
>>>
>>> +static bool inflight(const struct i915_request *rq,
>>> + const struct intel_engine_cs *engine)
>>> +{
>>> + const struct i915_request *active;
>>> +
>>> + if (!rq->global_seqno)
>>> + return false;
>>> +
>>> + active = port_request(engine->execlists.port);
>>> + return active->hw_context == rq->hw_context;
>>> +}
>>> +
>>> static void __i915_schedule(struct i915_request *rq,
>>> const struct i915_sched_attr *attr)
>>> {
>>> @@ -328,6 +340,7 @@ static void __i915_schedule(struct i915_request *rq,
>>> INIT_LIST_HEAD(&dep->dfs_link);
>>>
>>> engine = sched_lock_engine(node, engine);
>>> + lockdep_assert_held(&engine->timeline.lock);
>>>
>>> /* Recheck after acquiring the engine->timeline.lock */
>>> if (prio <= node->attr.priority || node_signaled(node))
>>> @@ -356,17 +369,16 @@ static void __i915_schedule(struct i915_request *rq,
>>> if (prio <= engine->execlists.queue_priority)
>>> continue;
>>>
>>> + engine->execlists.queue_priority = prio;
>>> +
>>> /*
>>> * If we are already the currently executing context, don't
>>> * bother evaluating if we should preempt ourselves.
>>> */
>>> - if (node_to_request(node)->global_seqno &&
>>> - i915_seqno_passed(port_request(engine->execlists.port)->global_seqno,
>>> - node_to_request(node)->global_seqno))
>>> + if (inflight(node_to_request(node), engine))
>>> continue;
>>>
>>> /* Defer (tasklet) submission until after all of our updates. */
>>> - engine->execlists.queue_priority = prio;
>>> tasklet_hi_schedule(&engine->execlists.tasklet);
>>> }
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 8aa8a4862543..d9d744f6ab2c 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -182,12 +182,64 @@ static inline int rq_prio(const struct i915_request *rq)
>>> }
>>>
>>> static inline bool need_preempt(const struct intel_engine_cs *engine,
>>> - const struct i915_request *last,
>>> - int prio)
>>> + const struct i915_request *rq,
>>> + int q_prio)
>>> {
>>> - return (intel_engine_has_preemption(engine) &&
>>> - __execlists_need_preempt(prio, rq_prio(last)) &&
>>> - !i915_request_completed(last));
>>> + const struct intel_context *ctx = rq->hw_context;
>>> + const int last_prio = rq_prio(rq);
>>> + struct rb_node *rb;
>>> + int idx;
>>> +
>>> + if (!intel_engine_has_preemption(engine))
>>> + return false;
>>> +
>>> + if (i915_request_completed(rq))
>>> + return false;
>>> +
>>> + if (!__execlists_need_preempt(q_prio, last_prio))
>>> + return false;
>>> +
>>> + /*
>>> + * The queue_priority is a mere hint that we may need to preempt.
>>> + * If that hint is stale or we may be trying to preempt ourselves,
>>> + * ignore the request.
>>> + */
>>> +
>>> + list_for_each_entry_continue(rq, &engine->timeline.requests, link) {
>>> + GEM_BUG_ON(rq->hw_context == ctx);
>>
>> Why would there be no more requests belonging to the same context on the
>> engine timeline after the first one? Did you mean "if (rq->hw_context ==
>> ctx) continue;" ?
>
> We enter the function with rq == execlist->port, i.e. the last request
> submitted to ELSP[0]. In this loop, we iterate from the start of ELSP[1]
> and all the request here belong to that context. It is illegal for
> ELSP[0].lrca == ELSP[1].lrca, i.e. the context must be different.
Yes, you are right yet again. I missed the fact it is guaranteed this is
called with port0. I wonder if function signature should change to make
this obvious so someone doesn't get the idea to call it with any old
request.
>
>>
>>> + if (rq_prio(rq) > last_prio)
>>> + return true;
>>> + }
>>> +
>>> + rb = rb_first_cached(&engine->execlists.queue);
>>> + if (!rb)
>>> + return false;
>>> +
>>> + priolist_for_each_request(rq, to_priolist(rb), idx)
>>> + return rq->hw_context != ctx && rq_prio(rq) > last_prio;
>>
>> This isn't equivalent to top of the queue priority
>> (engine->execlists.queue_priority)? Apart from the different ctx check.
>> So I guess it is easier than storing new engine->execlists.queue_top_ctx
>> and wondering about the validity of that pointer.
>
> The problem being that queue_priority may not always match the top of
> the execlists->queue. Right, the first attempt I tried was to store the
> queue_context matching the queue_priority, but due to the suppression of
> inflight preemptions, it doesn't match for long, and is not as accurate
> as one would like across CS events.
>
> priolist_for_each_request() isn't too horrible for finding the first
> pointer. I noted that we teach it to do: for(idx = __ffs(p->used); ...)
> though. If we didn't care about checking hw_context, we can compute the
> prio from (p->priority+1)<<SHIFT - ffs(p->used).
And this check is definitely needed to avoid some issue? I'll need to
have another try of understanding the commit and code paths fully
tomorrow. I mean, only because it would be good to have something more
elegant that full blown tree lookup.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list