[Intel-gfx] [PATCH i-g-t] lib: don't hang on blt on snb

Chris Wilson chris at chris-wilson.co.uk
Mon Aug 7 16:34:03 UTC 2017


Quoting Daniel Vetter (2017-08-07 17:26:56)
> On Fri, Aug 04, 2017 at 06:05:10PM +0100, Chris Wilson wrote:
> > Quoting Daniel Vetter (2017-08-04 17:07:22)
> > > We now have full (or a lot at least) igt running in beta CI, and snb
> > > blt hangs are really unhappy:
> > > 
> > > - drv_hangman at error-state-capture-blt and gem_exec_capture at capture-blt
> > >   reliably result in insta-machine death when we try to reset the gpu,
> > >   both on the CI snb and the one I have here.
> > > 
> > > - Other testcases also randomly (and sometimes rather rarely) die on
> > >   snb.
> > > 
> > > We can't use the endless batch because that results in a reset failure
> > > and wedged gpu, so also not really better.
> > 
> > It shouldn't be the recursion, but the invalid instruction we use to try
> > and trigger the hang quicker (otherwise hangcheck may see the advancing
> > ACTHD and give us longer to escape the loop).
> > 
> > In gem_exec_capture we shouldn't even need that invalid instruction, we
> > just need the busy batch as we pull the trigger ourselves, and if that
> > fails to reset a simple recursive batch we have some issues to resolve.
> 
> Endless loop for haning results in a reset failure on blt as described in
> the commit message. We end up with a permanent and unrecoverable -EIO,
> which is as deadly to CI as outright killing the machine.

No, it doesn't. snb-gt1 exhibiting the machine death on invalid blt
instruction as reported, after fixes:

Subtest error-state-basic: SUCCESS (0.001s)
Subtest error-state-capture-render: SUCCESS (7.740s)
Subtest error-state-capture-bsd: SUCCESS (6.024s)
Test requirement not met in function test_error_state_capture, file drv_hangman.c:187:
Test requirement: gem_has_ring(device, ring_id)
Subtest error-state-capture-bsd1: SKIP (0.000s)
Test requirement not met in function test_error_state_capture, file drv_hangman.c:187:
Test requirement: gem_has_ring(device, ring_id)
Subtest error-state-capture-bsd2: SKIP (0.000s)
Subtest error-state-capture-blt: SUCCESS (13.965s)
Test requirement not met in function test_error_state_capture, file drv_hangman.c:187:
Test requirement: gem_has_ring(device, ring_id)
Subtest error-state-capture-vebox: SKIP (0.000s)

Subtest capture-render: SUCCESS (0.003s)
Test requirement not met in function __real_main175, file gem_exec_capture.c:202:
Test requirement: gem_can_store_dword(fd, e->exec_id | e->flags)
Subtest capture-bsd: SKIP (0.000s)
Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:1642:
Test requirement: gem_has_ring(fd, ring)
Subtest capture-bsd1: SKIP (0.000s)
Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:1642:
Test requirement: gem_has_ring(fd, ring)
Subtest capture-bsd2: SKIP (0.000s)
Subtest capture-blt: SUCCESS (0.002s)
Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:1642:
Test requirement: gem_has_ring(fd, ring)
Subtest capture-vebox: SKIP (0.000s)

-Chris


More information about the Intel-gfx mailing list