[Intel-gfx] [PATCH 09/11] drm/i915/execlists: Refactor out can_merge_rq()

Thu Jan 31 09:19:18 UTC 2019

On 30/01/2019 18:14, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-30 18:05:42)
>>
>> On 30/01/2019 02:19, Chris Wilson wrote:
>>> In the next patch, we add another user that wants to check whether
>>> requests can be merge into a single HW execution, and in the future we
>>> want to add more conditions under which requests from the same context
>>> cannot be merge. In preparation, extract out can_merge_rq().
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 30 +++++++++++++++++++-----------
>>>    1 file changed, 19 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 2616b0b3e8d5..e97ce54138d3 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>>>    }
>>>    
>>>    __maybe_unused static inline bool
>>> -assert_priority_queue(const struct intel_engine_execlists *execlists,
>>> -                   const struct i915_request *prev,
>>> +assert_priority_queue(const struct i915_request *prev,
>>>                      const struct i915_request *next)
>>>    {
>>> -     if (!prev)
>>> -             return true;
>>> +     const struct intel_engine_execlists *execlists =
>>> +             &prev->engine->execlists;
>>>    
>>>        /*
>>>         * Without preemption, the prev may refer to the still active element
>>> @@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
>>>        return true;
>>>    }
>>>    
>>> +static bool can_merge_rq(const struct i915_request *prev,
>>> +                      const struct i915_request *next)
>>> +{
>>> +     GEM_BUG_ON(!assert_priority_queue(prev, next));
>>> +
>>> +     if (!can_merge_ctx(prev->hw_context, next->hw_context))
>>> +             return false;
>>> +
>>> +     return true;
>>
>> I'll assume you'll be adding here in the future as the reason this is
>> not simply "return can_merge_ctx(...)"?
> 
> Yes, raison d'etre of making the change.
> 
>>>    static void port_assign(struct execlist_port *port, struct i915_request *rq)
>>>    {
>>>        GEM_BUG_ON(rq == port_request(port));
>>> @@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                int i;
>>>    
>>>                priolist_for_each_request_consume(rq, rn, p, i) {
>>> -                     GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
>>> -
>>>                        /*
>>>                         * Can we combine this request with the current port?
>>>                         * It has to be the same context/ringbuffer and not
>>> @@ -766,8 +774,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                         * second request, and so we never need to tell the
>>>                         * hardware about the first.
>>>                         */
>>> -                     if (last &&
>>> -                         !can_merge_ctx(rq->hw_context, last->hw_context)) {
>>> +                     if (last && !can_merge_rq(last, rq)) {
>>> +                             if (last->hw_context == rq->hw_context)
>>> +                                     goto done;
>>
>> I don't get this added check. AFAICS it will only trigger with GVT
>> making it not consider filling both ports if possible.
> 
> Because we are preparing for can_merge_rq() deciding not to merge the
> same context. If we do that we can't continue on to the next port and
> must terminate the loop, violating the trick with the hint in the
> process.
> 
> This changes due to the next patch, per-context freq and probably more
> that I've forgotten.

After a second look, I noticed the existing GVT comment a bit lower down 
which avoids populating port1 already.

Maybe one thing which would make sense is to re-arange these checks in 
the order of "priority", like:

	if (last && !can_merge_rq(...)) {
		// naturally highest prio since it is impossible
		if (port == last_port)
			goto done;
		// 2nd highest to account for programming limitation
		else if (last->hw_context == rq->hw_context)
			goto done;
		// GVT check simplified (I think - since we know last is either 
different ctx or single submit)
		else if (ctx_single_port_submission(rq->hw_context))
			goto done;
> 
>>> +
>>>                                /*
>>>                                 * If we are on the second port and cannot
>>>                                 * combine this request with the last, then we
>>> @@ -787,7 +797,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                                    ctx_single_port_submission(rq->hw_context))
>>>                                        goto done;
>>>    
>>> -                             GEM_BUG_ON(last->hw_context == rq->hw_context);
>>
>> This is related to the previous comment. Rebase error?
> 
> Previous if check, so it's clear at this point that we can't be using
> the same.

Yep.

> 
>>> @@ -827,8 +836,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>         * request triggering preemption on the next dequeue (or subsequent
>>>         * interrupt for secondary ports).
>>>         */
>>> -     execlists->queue_priority_hint =
>>> -             port != execlists->port ? rq_prio(last) : INT_MIN;
>>> +     execlists->queue_priority_hint = queue_prio(execlists);
>>
>> This shouldn't be in this patch.
> 
> If we terminate the loop early, we need to look at the head of the
> queue.

Why it is different for ending early for any other (existing) reason? 
Although I concede better management of queue_priority_hint is exactly 
what I was suggesting. Oops. Consequences are not entirely straight 
forward though.. if we decide not to submit all of a single context, or 
leave port1 empty, currently we would hint scheduling the tasklet for 
any new submission. With this change only after a CS or if a higher ctx 
is submitted. Which is what makes me feel it should be a separate patch 
for a behaviour change (since a high prio, higher than INT_MIN, is 
potentially head of the queue).

Regards,

Tvrtko