[Intel-gfx] [PATCH igt] lib/gt: Insert an arbitration point in our hang batch
Chris Wilson
chris at chris-wilson.co.uk
Fri Oct 27 13:30:40 UTC 2017
Quoting Ville Syrjälä (2017-10-27 14:24:16)
> On Fri, Oct 27, 2017 at 01:59:17PM +0100, Chris Wilson wrote:
> > Quoting Ville Syrjälä (2017-10-27 13:54:49)
> > > On Fri, Oct 27, 2017 at 01:45:35PM +0100, Chris Wilson wrote:
> > > > A purely recursive batch has the downside that it is a severe drain on
> > > > system resources (see commit f978cc027cd0 "lib/dummyload: Pad with a few
> > > > nops so that we do not completely hog the system") which can result in
> > > > the test being starved and failing to make reasonably progress. For more
> > > > reliable resets, also include an arbitration point. This should lessen
> > > > the efficacy of the hang...
> > > >
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > > > ---
> > > > lib/igt_gt.c | 10 +++++-----
> > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> > > > index 89727d22..2aebad14 100644
> > > > --- a/lib/igt_gt.c
> > > > +++ b/lib/igt_gt.c
> > > > @@ -313,12 +313,12 @@ igt_hang_t igt_hang_ctx(int fd,
> > > > len = 2;
> > > > if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> > > > len++;
> > > > - b[0] = MI_BATCH_BUFFER_START | (len - 2);
> > > > - b[len] = MI_BATCH_BUFFER_END;
> > > > - b[len+1] = MI_NOOP;
> > > > - gem_write(fd, exec.handle, 0, b, sizeof(b));
> > > > + b[0] = 0x5 << 24; /* ARB_CHk */
> > >
> > > That seems to be gen4+ only. Also
> > > "This instruction can be placed only in a ring buffer, never in a batch
> > > buffer."
> >
> > Idea of since/until when? It's definitely listed for batches on gen8+.
>
> If I'm reading things correctly then IVB+ can have it in an RCS batch.
> But for the other engines it seems that the "ring only" restriction
> still applies, until some *future* gen.
>
> > The good news being an unrecognised MI cmd it is skipped. Better news if
> > it causes a GPU hang ;)
>
> Based on the description it could be that it just blindly jumps to UHPTR
> and there wouldn't be any way to get back to the batch. Maybe. That's
> assuming UHPTR would have the valid bit set in the first place. I guess
> if UHPTR isn't valid then nothing should happend whether or not the
> parser accepts the command.
Fortunately we never set the valid bit for UHPTR. Unfortunately we never
clear it either. (Or set the INSTPM for autoclear). However, because the
gpu often seems to be working, I conclude that bit is rarely randomly
set.
> I don't actually know how you would get back to the previous point on
> the ring either after jumping to UHPTR from the ring. Is the previous
> HEAD saved somewhere?
Afaict, it basically bumps RING_HEAD to UHPTR and so continues to
RING_TAIL. (And presumably then jumps back to UHPTR upon hitting that
preemption point unless you manage to clear UHPTR in time.)
But I'm guessing as I haven't successfully used it yet.
-Chris
More information about the Intel-gfx
mailing list