[Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Mar 15 17:37:27 UTC 2021


On 12/03/2021 15:46, Tvrtko Ursulin wrote:
> From: Chris Wilson <chris at chris-wilson.co.uk>
> 
> Currently, we cancel outstanding requests within a context when the
> context is closed. We may also want to cancel individual requests using
> the same graceful preemption mechanism.
> 
> v2 (Tvrtko):
>   * Cancel waiters carefully considering no timeline lock and RCU.
>   * Fixed selftests.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

[snip]

> +void i915_request_cancel(struct i915_request *rq, int error)
> +{
> +	if (!i915_request_set_error_once(rq, error))
> +		return;
> +
> +	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> +
> +	if (i915_sw_fence_signaled(&rq->submit)) {
> +		struct i915_dependency *p;
> +
> +restart:
> +		rcu_read_lock();
> +		for_each_waiter(p, rq) {
> +			struct i915_request *w =
> +				container_of(p->waiter, typeof(*w), sched);
> +
> +			if (__i915_request_is_complete(w) ||
> +			    fatal_error(w->fence.error))
> +				continue;
> +
> +			w = i915_request_get(w);
> +			rcu_read_unlock();
> +			/* Recursion bound by the number of engines */
> +			i915_request_cancel(w, error);
> +			i915_request_put(w);
> +
> +			/* Restart after having to drop rcu lock. */
> +			goto restart;
> +		}

So I need to fix this error propagation to waiters in order to avoid 
potential stack overflow caught in shards (gem_ctx_ringsize).

Or alternatively we decide not to propagate fence errors. Not sure that 
consequences either way are particularly better or worse. Things will 
break anyway since what userspace inspects for unexpected fence errors?!

So rendering corruption more or less. Can it cause a further stream of 
GPU hangs I am not sure. Only if there is a inter-engine data dependency 
involving data more complex than images/textures.

Regards,

Tvrtko

> +		rcu_read_unlock();
> +	}
> +
> +	__cancel_request(rq);
> +}


More information about the Intel-gfx mailing list