[Intel-gfx] [PATCH 2/2] drm/i915: Force reset on unready engine
Chris Wilson
chris at chris-wilson.co.uk
Mon Aug 13 08:32:24 UTC 2018
Quoting Mika Kuoppala (2018-08-13 09:18:07)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > Quoting Mika Kuoppala (2018-08-10 15:00:36)
> >> If engine reports that it is not ready for reset, we
> >> give up. Evidence shows that forcing a per engine reset
> >> on an engine which is not reporting to be ready for reset,
> >> can bring it back into a working order. There is risk that
> >> we corrupt the context image currently executing on that
> >> engine. But that is a risk worth taking as if we unblock
> >> the engine, we prevent a whole device wedging in a case
> >> of full gpu reset.
> >>
> >> Reset individual engine even if it reports that it is not
> >> prepared for reset, but only if we aim for full gpu reset
> >> and not on first reset attempt.
> >>
> >> v2: force reset only on later attempts, readability (Chris)
> >>
> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >> ---
> >> drivers/gpu/drm/i915/intel_uncore.c | 49 +++++++++++++++++++++++------
> >> 1 file changed, 39 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> >> index 027d14574bfa..d24026839b17 100644
> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> >> @@ -2099,9 +2099,6 @@ static int gen8_reset_engine_start(struct intel_engine_cs *engine)
> >> RESET_CTL_READY_TO_RESET,
> >> 700, 0,
> >> NULL);
> >> - if (ret)
> >> - DRM_ERROR("%s: reset request timeout\n", engine->name);
> >> -
> >> return ret;
> >> }
> >>
> >> @@ -2113,6 +2110,42 @@ static void gen8_reset_engine_cancel(struct intel_engine_cs *engine)
> >> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
> >> }
> >>
> >> +static int reset_engines(struct drm_i915_private *i915,
> >> + unsigned int engine_mask,
> >> + unsigned int retry)
> >> +{
> >> + if (INTEL_GEN(i915) >= 11)
> >> + return gen11_reset_engines(i915, engine_mask);
> >> + else
> >> + return gen6_reset_engines(i915, engine_mask, retry);
> >> +}
> >> +
> >> +static int gen8_prepare_engine_for_reset(struct intel_engine_cs *engine,
> >> + unsigned int retry)
> >> +{
> >> + const bool force_reset_on_non_ready = retry >= 1;
> >> + int ret;
> >> +
> >> + ret = gen8_reset_engine_start(engine);
> >> +
> >> + if (ret && force_reset_on_non_ready) {
> >> + /*
> >> + * Try to unblock a single non-ready engine by risking
> >> + * context corruption.
> >> + */
> >> + ret = reset_engines(engine->i915,
> >> + intel_engine_flag(engine),
> >> + retry);
> >> + if (!ret)
> >> + ret = gen8_reset_engine_start(engine);
> >> +
> >> + DRM_ERROR("%s: reset request timeout, forcing reset (%d)\n",
> >> + engine->name, ret);
> >
> > This looks dubious now ;)
> >
> > After the force you then do a reset in the caller. Twice the reset for
> > twice the unpreparedness.
>
> It is intentional. First we make the engine unstuck and then do
> a full cycle with ready to reset involved. I don't know if
> it really matters tho. It could be that the engine is already
> in pristine condition after first.
Looks extremely weird way of going about it. If you want to do a double
reset after the first try fails, try
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4e5826045cbf..6fe137d7d455 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -450,6 +450,8 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
GEM_TRACE("engine_mask=%x\n", engine_mask);
preempt_disable();
ret = reset(i915, engine_mask);
+ if (retry > 0 && !ret) /* Double check reset worked */
+ ret = reset(i915, engine_mask);
preempt_enable();
}
if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
as a separate patch.
P.S. you really need to review the i915_reset.c patch ;)
-Chris
More information about the Intel-gfx
mailing list