[Intel-gfx] [PATCH 3/6] drm/i915/gt: Allow failed resets without assertion
Andi Shyti
andi.shyti at intel.com
Tue Jan 5 01:55:16 UTC 2021
Hi Chris,
On Mon, Jan 04, 2021 at 11:51:42AM +0000, Chris Wilson wrote:
> If the engine reset fails, we will attempt to resume with the current
> inflight submissions. When that happens, we cannot assert that the
> engine reset cleared the pending submission, so do not.
>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2878
> Fixes: 16f2941ad307 ("drm/i915/gt: Replace direct submit with direct call to tasklet")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 +
> .../drm/i915/gt/intel_execlists_submission.c | 6 +-
> drivers/gpu/drm/i915/gt/intel_reset.c | 3 +
> drivers/gpu/drm/i915/gt/selftest_execlists.c | 75 +++++++++++++++++++
> 4 files changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index c28f4e190fe6..430066e5884c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -561,6 +561,8 @@ struct intel_engine_cs {
> unsigned long stop_timeout_ms;
> unsigned long timeslice_duration_ms;
> } props, defaults;
> +
> + I915_SELFTEST_DECLARE(struct fault_attr reset_timeout);
> };
>
> static inline bool
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 2afbc0a4ca03..f02e3ae10d28 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3047,9 +3047,13 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
> * After a GPU reset, we may have requests to replay. Do so now while
> * we still have the forcewake to be sure that the GPU is not allowed
> * to sleep before we restart and reload a context.
> + *
> + * If the GPU reset fails, the engine may still be alive with requests
> + * inflight. We expect those to complete, or for the device to be
> + * reset as the next level of recovery, and as a final resort we
> + * will declare the device wedged.
> */
> GEM_BUG_ON(!reset_in_progress(execlists));
> - GEM_BUG_ON(engine->execlists.pending[0]);
I would have split this in two patches, but it looks good anyway.
Reviewed-by: Andi Shyti <andi.shyti at intel.com>
Thanks,
Andi
More information about the Intel-gfx
mailing list