[Intel-gfx] [PATCH] drm/i915/gem: Propagate error from cancelled submit due to context closure

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Dec 4 11:09:47 UTC 2020


On 03/12/2020 10:34, Chris Wilson wrote:
> In the course of discovering and closing many races with context closure
> and execbuf submission, since commit 61231f6bd056 ("drm/i915/gem: Check
> that the context wasn't closed during setup") we started checking that
> the context was not closed by another userspace thread during the execbuf
> ioctl. In doing so we cancelled the inflight request (by telling it to be
> skipped), but kept reporting success since we do submit a request, albeit
> one that doesn't execute. As the error is known before we return from the
> ioctl, we can report the error we detect immediately, rather than leave
> it on the fence status. With the immediate propagation of the error, it
> is easier for userspace to handle.
> 
> Fixes: 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup")
> Testcase: igt/gem_ctx_exec/basic-close-race
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: <stable at vger.kernel.org> # v5.7+
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 1904e6e5ea64..b07dc1156a0e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3097,7 +3097,7 @@ static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
>   			break;
>   }
>   
> -static void eb_request_add(struct i915_execbuffer *eb)
> +static int eb_request_add(struct i915_execbuffer *eb, int err)
>   {
>   	struct i915_request *rq = eb->request;
>   	struct intel_timeline * const tl = i915_request_timeline(rq);
> @@ -3118,6 +3118,7 @@ static void eb_request_add(struct i915_execbuffer *eb)
>   		/* Serialise with context_close via the add_to_timeline */
>   		i915_request_set_error_once(rq, -ENOENT);
>   		__i915_request_skip(rq);
> +		err = -ENOENT; /* override any transient errors */
>   	}
>   
>   	__i915_request_queue(rq, &attr);
> @@ -3127,6 +3128,8 @@ static void eb_request_add(struct i915_execbuffer *eb)
>   		retire_requests(tl, prev);
>   
>   	mutex_unlock(&tl->mutex);
> +
> +	return err;
>   }
>   
>   static const i915_user_extension_fn execbuf_extensions[] = {
> @@ -3332,7 +3335,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	err = eb_submit(&eb, batch);
>   err_request:
>   	i915_request_get(eb.request);
> -	eb_request_add(&eb);
> +	err = eb_request_add(&eb, err);
>   
>   	if (eb.fences)
>   		signal_fence_array(&eb);
> 

Simple enough and it explains why gem_busy assert in the IGT can fail 
after execbuf succeeded - skipped request executes before the check 
since the payload was zapped. Fast timing but obviously possible.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko


More information about the Intel-gfx mailing list