[Intel-gfx] [PATCH i-g-t] lib: don't hang on blt on snb
Daniel Vetter
daniel at ffwll.ch
Mon Aug 7 16:26:56 UTC 2017
On Fri, Aug 04, 2017 at 06:05:10PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-08-04 17:07:22)
> > We now have full (or a lot at least) igt running in beta CI, and snb
> > blt hangs are really unhappy:
> >
> > - drv_hangman at error-state-capture-blt and gem_exec_capture at capture-blt
> > reliably result in insta-machine death when we try to reset the gpu,
> > both on the CI snb and the one I have here.
> >
> > - Other testcases also randomly (and sometimes rather rarely) die on
> > snb.
> >
> > We can't use the endless batch because that results in a reset failure
> > and wedged gpu, so also not really better.
>
> It shouldn't be the recursion, but the invalid instruction we use to try
> and trigger the hang quicker (otherwise hangcheck may see the advancing
> ACTHD and give us longer to escape the loop).
>
> In gem_exec_capture we shouldn't even need that invalid instruction, we
> just need the busy batch as we pull the trigger ourselves, and if that
> fails to reset a simple recursive batch we have some issues to resolve.
Endless loop for haning results in a reset failure on blt as described in
the commit message. We end up with a permanent and unrecoverable -EIO,
which is as deadly to CI as outright killing the machine.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list