[Intel-gfx] [PATCH] drm/i915: Try harder to reset the gen8+ engines
Mika Kuoppala
mika.kuoppala at linux.intel.com
Tue Sep 6 13:25:11 UTC 2016
Chris Wilson <chris at chris-wilson.co.uk> writes:
> On Tue, Sep 06, 2016 at 02:11:47PM +0300, Mika Kuoppala wrote:
>>
>> Resending my r-b...
>
> It's not enough, even retrying the reset a few times, we still
> eventually get a timeout.
>
> This is just the usual
> 10: 0xffffffff
> 20: goto 10
> hanging batch
>>
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>>
>> > If at first we don't succeed, try again.
>> >
>> > Running the reset and recovery routines in a loop ends in a "reset
>> > request timeout" with a mtbf of an hour on Braswell. This is eerily
>> > similar to the unrecoverable reset condition that first prompted us to
>> > use the reset-request mechanism in commit 7fd2d26921d1 ("drm/i915: Reset
>> > request handling for gen8+"). Repeating the reset request makes the
>> > failure much harder to reproduce (but there is no reason to believe that
>> > it is more than mere paper over a timing or other issue).
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> > Cc: Arun Siluvery <arun.siluvery at linux.intel.com>
>> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>> > Cc: stable at vger.kernel.org
>> > ---
>> > drivers/gpu/drm/i915/intel_uncore.c | 24 +++++++++++++-----------
>> > 1 file changed, 13 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> > index e9f68cd56e32..1be8ced03ba5 100644
>> > --- a/drivers/gpu/drm/i915/intel_uncore.c
>> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> > @@ -1688,20 +1688,22 @@ int intel_wait_for_register(struct drm_i915_private *dev_priv,
>> > static int gen8_request_engine_reset(struct intel_engine_cs *engine)
>> > {
>> > struct drm_i915_private *dev_priv = engine->i915;
>> > - int ret;
>> > + int loop = 3;
>> >
>>
>> retries?
>>
>> > - I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
>> > - _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
>> > + do {
>> > + I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
>> > + _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
>> >
>> > - ret = intel_wait_for_register_fw(dev_priv,
>> > - RING_RESET_CTL(engine->mmio_base),
>> > - RESET_CTL_READY_TO_RESET,
>> > - RESET_CTL_READY_TO_RESET,
>> > - 700);
>> > - if (ret)
>> > - DRM_ERROR("%s: reset request timeout\n", engine->name);
>> > + if (!intel_wait_for_register_fw(dev_priv,
>> > + RING_RESET_CTL(engine->mmio_base),
>> > + RESET_CTL_READY_TO_RESET,
>> > + RESET_CTL_READY_TO_RESET,
>> > + 700))
>>
>>
>> Did you check between the write didn't stick vs the readyness didn't
>> signal?
>
> The read will flush the write. So reading back the bit we just set shows
> us that the hw is not yet ready.
>
> Shouldn't we also be waiting for the reest bit to clear on cancel? And
> verifying that the reset itself did?
>
Request -> timeout -> cancel and wait 2 lowest bit to be zero and
then only after that retry?
And would not hurt to check they are zero before continuing after
a succesful reset also.
-Mika
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 6b84bd63310c..36d9a485b788 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1701,6 +1701,9 @@ static void gen8_unrequest_engine_reset(struct intel_engine_cs *engine)
>
> I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
> + intel_wait_for_register_fw(dev_priv,
> + RING_RESET_CTL(engine->mmio_base),
> + RESET_CTL_READY_TO_RESET, 0, 700);
> }
>
> static int gen8_reset_engines(struct drm_i915_private *dev_priv,
> @@ -1708,18 +1711,18 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv,
> {
> struct intel_engine_cs *engine;
> unsigned int tmp;
> + int ret = -EIO;
>
> for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
> if (gen8_request_engine_reset(engine))
> goto not_ready;
>
> - return gen6_reset_engines(dev_priv, engine_mask);
> + ret = gen6_reset_engines(dev_priv, engine_mask);
>
> -not_ready:
> for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
> gen8_unrequest_engine_reset(engine);
>
> - return -EIO;
> + return ret;
> }
>
> typedef int (*reset_func)(struct drm_i915_private *, unsigned engine_mask);
>
> --
> Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list