[Intel-gfx] [PATCH v2] drm/i915: Check for awaits on still currently executing requests

Fri May 29 16:01:55 UTC 2020

On 29/05/2020 15:39, Chris Wilson wrote:
> With the advent of preempt-to-busy, a request may still be on the GPU as
> we unwind. And in the case of a unpreemptible [due to HW] request, that
> request will remain indefinitely on the GPU even though we have
> returned it back to our submission queue, and cleared the active bit.
> 
> We only run the execution callbacks on transferring the request from our
> submission queue to the execution queue, but if this is a bonded request
> that the HW is waiting for, we will not submit it (as we wait for a
> fresh execution) even though it is still being executed.
> 
> As we know that there are always preemption points between requests, we
> know that only the currently executing request may be still active even
> though we have cleared the flag. However, we do not precisely know which
> request is in ELSP[0] due to a delay in processing events, and
> furthermore we only store the last request in a context in our state
> tracker.
> 
> Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> Testcase: igt/gem_exec_balancer/bonded-dual
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 49 ++++++++++++++++++++++++++++-
>   1 file changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index e5aba6824e26..c5d7220de529 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -363,6 +363,53 @@ static void __llist_add(struct llist_node *node, struct llist_head *head)
>   	head->first = node;
>   }
>   
> +static struct i915_request * const *
> +__engine_active(struct intel_engine_cs *engine)
> +{
> +	return READ_ONCE(engine->execlists.active);
> +}
> +
> +static bool __request_in_flight(const struct i915_request *signal)
> +{
> +	struct i915_request * const *port, *rq;
> +	bool inflight = false;
> +
> +	if (!i915_request_is_ready(signal))
> +		return false;
> +
> +	/*
> +	 * Even if we have unwound the request, it may still be on
> +	 * the GPU (preempt-to-busy). If that request is inside an
> +	 * unpreemptible critical section, it will not be removed. Some
> +	 * GPU functions may even be stuck waiting for the paired request
> +	 * (__await_execution) to be submitted and cannot be preempted
> +	 * until the bond is executing.
> +	 *
> +	 * As we know that there are always preemption points between
> +	 * requests, we know that only the currently executing request
> +	 * may be still active even though we have cleared the flag.
> +	 * However, we can't rely on our tracking of ELSP[0] to known
> +	 * which request is currently active and so maybe stuck, as
> +	 * the tracking maybe an event behind. Instead assume that
> +	 * if the context is still inflight, then it is still active
> +	 * even if the active flag has been cleared.
> +	 */
> +	if (!intel_context_inflight(signal->context))
> +		return false;
> +
> +	rcu_read_lock();
> +	for (port = __engine_active(signal->engine); (rq = *port); port++) {
> +		if (rq->context == signal->context) {
> +			inflight = i915_seqno_passed(rq->fence.seqno,
> +						     signal->fence.seqno);
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return inflight;
> +}
> +
>   static int
>   __await_execution(struct i915_request *rq,
>   		  struct i915_request *signal,
> @@ -393,7 +440,7 @@ __await_execution(struct i915_request *rq,
>   	}
>   
>   	spin_lock_irq(&signal->lock);
> -	if (i915_request_is_active(signal)) {
> +	if (i915_request_is_active(signal) || __request_in_flight(signal)) {
>   		if (hook) {
>   			hook(rq, &signal->fence);
>   			i915_request_put(signal);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko