[Intel-gfx] [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast"
Chris Wilson
chris at chris-wilson.co.uk
Thu Nov 26 04:59:37 PST 2015
On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote:
> Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu
> hang we need to be careful to not run into the "hanging too fast
> check":
>
> - don't restore the ban period, but instead keep it at 0.
> - make sure we idle the gpu fully before hanging it again (wait
> subtest missted that).
>
> With this gem_eio works now reliable even when I don't run the
> subtests individually.
>
> Of course it's a bit fishy that the default ctx gets blamed for
> essentially doing nothing, but until that's figured out in upstream
> it's better to make the test work for now.
This used to be reliable. And just disabling all banning in the kernel
forever more is silly.
During igt_post_hang_ring:
1. we wait upon the hanging batch
- this returns when hangcheck fires
2. reset the ban period to normal
- this takes mutex_lock_interruptible and so must wait for the reset
handler to run before it can make the change,
- ergo the hanging batch never triggers a ban for itself.
- (a subsequent nonsimulated GPU hang may trigger the ban though)
Nak.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list