[Intel-gfx] [PATCH] drm/i915/selftests: Try to recover from a wedged GPU during reset tests
Chris Wilson
chris at chris-wilson.co.uk
Tue Sep 26 13:03:11 UTC 2017
Quoting Mika Kuoppala (2017-09-26 13:48:17)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > If we see the seqno stop progressing, we abandon the test for fear that
> > the GPU died following the reset. However, during test teardown we still
> > wait for the GPU to idle before continuing, but we have already
> > confirmed that the GPU is dead. Furthermore, since we are inside a reset
> > test, we have disabled the hangchecker, and so there is no safety net and
> > we wait indefinitely. Detect the stuck GPU and declare it wedged as a
> > state of emergency so we can escape.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Jari Tahvanainen <jari.tahvanainen at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 25 +++++++++++++++++++-----
> > 1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index 02e52a146ed8..913fe752f6b4 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -165,6 +165,7 @@ static int emit_recurse_batch(struct hang *h,
> > *batch++ = lower_32_bits(vma->node.start);
> > }
> > *batch++ = MI_BATCH_BUFFER_END; /* not reached */
> > + wmb();
> >
>
> Why not the big hammer with i915_gem_chipset_flush() here?
It didn't cross my mind, I was just doodling :)
>
> > flags = 0;
> > if (INTEL_GEN(vm->i915) <= 5)
> > @@ -621,7 +622,12 @@ static int igt_wait_reset(void *arg)
> > __i915_add_request(rq, true);
> >
> > if (!wait_for_hang(&h, rq)) {
> > - pr_err("Failed to start request %x\n", rq->fence.seqno);
> > + pr_err("Failed to start request %x, at %x\n",
> > + rq->fence.seqno, hws_seqno(&h, rq));
> > +
> > + i915_reset(i915, 0);
> > + i915_gem_set_wedged(i915);
> > +
> > err = -EIO;
> > goto out_rq;
> > }
> > @@ -708,10 +714,14 @@ static int igt_reset_queue(void *arg)
> > __i915_add_request(rq, true);
> >
> > if (!wait_for_hang(&h, prev)) {
> > - pr_err("Failed to start request %x\n",
> > - prev->fence.seqno);
> > + pr_err("Failed to start request %x, at %x\n",
> > + rq->fence.seqno, hws_seqno(&h, rq));
>
> As you pointed out the debug in here is for wrong request.
>
> Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
Happy if I drop the wmb() for a later patch and replace it with a
chipset flush instead?
-Chris
More information about the Intel-gfx
mailing list