[Intel-gfx] [PATCH 29/32] drm/i915: Apply an execution_mask to the virtual_engine

Wed Apr 17 12:35:29 UTC 2019

On 17/04/2019 12:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-04-17 12:43:49)
>>
>> On 17/04/2019 08:56, Chris Wilson wrote:
>>> Allow the user to direct which physical engines of the virtual engine
>>> they wish to execute one, as sometimes it is necessary to override the
>>> load balancing algorithm.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c    |  58 +++++++++++
>>>    drivers/gpu/drm/i915/gt/selftest_lrc.c | 131 +++++++++++++++++++++++++
>>>    drivers/gpu/drm/i915/i915_request.c    |   1 +
>>>    drivers/gpu/drm/i915/i915_request.h    |   3 +
>>>    4 files changed, 193 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index d6efd6aa67cb..560a18bb4cbb 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -552,6 +552,18 @@ execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
>>>        intel_engine_context_out(rq->engine);
>>>        execlists_context_status_change(rq, status);
>>>        trace_i915_request_out(rq);
>>> +
>>> +     /*
>>> +      * If this is part of a virtual engine, its next request may have
>>> +      * been blocked waiting for access to the active context. We have
>>> +      * to kick all the siblings again in case we need to switch (e.g.
>>> +      * the next request is not runnable on this engine). Hopefully,
>>> +      * we will already have submitted the next request before the
>>> +      * tasklet runs and do not need to rebuild each virtual tree
>>> +      * and kick everyone again.
>>> +      */
>>> +     if (rq->engine != rq->hw_context->engine)
>>> +             tasklet_schedule(&rq->hw_context->engine->execlists.tasklet);
>>
>> Is this needed only for non-default execution_mask? If so it would be
>> good to limit it to avoid tasklet storm with plain veng.
> 
> The issue is not just with this rq but the next one. If that has a
> restricted mask that prevents it running on this engine, we may have
> missed the opportunity to queue it (and so never run it under just the
> right circumstances).
> 
> Something like
> 	to_virtual_engine(rq->hw_context->engine)->request->execution_mask & ~rq->execution_mask
> 
> The storm isn't quite so bad, it's only on context-out, and we often do
> succeed in keeping it busy. I was just trying to avoid pulling in ve here.

What do you mean by the "pulling in ve" bit? Avoiding using 
to_virtual_engine like in the line you wrote above?

> 
>>>    static u64 execlists_update_context(struct i915_request *rq)
>>> @@ -779,6 +791,9 @@ static bool virtual_matches(const struct virtual_engine *ve,
>>>    {
>>>        const struct intel_engine_cs *active;
>>>    
>>> +     if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */
>>> +             return false;
>>> +
>>>        /*
>>>         * We track when the HW has completed saving the context image
>>>         * (i.e. when we have seen the final CS event switching out of
>>> @@ -3139,12 +3154,44 @@ static const struct intel_context_ops virtual_context_ops = {
>>>        .destroy = virtual_context_destroy,
>>>    };
>>>    
>>> +static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
>>> +{
>>> +     struct i915_request *rq;
>>> +     intel_engine_mask_t mask;
>>
>> intel_engine_mask_t throughout is the wrong type for this, even if we
>> disallowed duplicate siblings, and even more so if we don't.
> 
> Why? It's a mask of engine->mask (and has to be of the physical id to
> allow it to be position invariant in siblings[]).
>   
>> Either way it seems like the 64 sibling limit needs to be back. Or maybe
>> only in the bonding case?
> 
> ?

My bad. I mistakenly thought execution_mask relates to position of 
engines in the siblings array.

>>
>>> +
>>> +     rq = READ_ONCE(ve->request);
>>> +     if (!rq)
>>> +             return 0;
>>> +
>>> +     /* The rq is ready for submission; rq->execution_mask is now stable. */
>>> +     mask = rq->execution_mask;
>>> +     if (unlikely(!mask)) {
>>> +             /* Invalid selection, submit to a random engine in error */
>>> +             i915_request_skip(rq, -ENODEV);
>>
>> When can this happen? It looks like if it can happen we should reject it
>> earlier. Or if it can't then just assert.
> 
> Many submit fences can end up with an interesection of 0. This is the
> convenient point to do the rejection, as with any other asynchronous
> error.

Which ones are many? Why would we have uAPI which allows setting 
impossible things where all requests will fail with -ENODEV?

> 
>>> +             mask = ve->siblings[0]->mask;
>>> +     }
>>> +
>>> +     GEM_TRACE("%s: rq=%llx:%lld, mask=%x, prio=%d\n",
>>> +               ve->base.name,
>>> +               rq->fence.context, rq->fence.seqno,
>>> +               mask, ve->base.execlists.queue_priority_hint);
>>> +
>>> +     return mask;
>>> +}
>>> +
>>>    static void virtual_submission_tasklet(unsigned long data)
>>>    {
>>>        struct virtual_engine * const ve = (struct virtual_engine *)data;
>>>        const int prio = ve->base.execlists.queue_priority_hint;
>>> +     intel_engine_mask_t mask;
>>>        unsigned int n;
>>>    
>>> +     rcu_read_lock();
>>> +     mask = virtual_submission_mask(ve);
>>> +     rcu_read_unlock();
>>
>> What is the RCU for?
> 
> Accessing ve->request. There's nothing stopping another engine from
> spotting the ve->request still in its tree, submitting it and it being
> retired all during the read here.

AFAIU there can only be one instance of virtual_submission_tasklet per 
VE at a time and the code above is before the request is inserted into 
physical engine trees. So I don't get it.

Hm.. but going back to the veng patch, there is a 
GEM_BUG_ON(ve->request) in virtual_submit_request. Why couldn't this be 
called multiple times in parallel for different requests?

Regards,

Tvrtko