[Intel-gfx] [PATCH 2/2] tests/gem_eio: Resilience against "hanging too fast"

Chris Wilson chris at chris-wilson.co.uk
Thu Nov 26 04:59:37 PST 2015


On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote:
> Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu
> hang we need to be careful to not run into the "hanging too fast
> check":
> 
> - don't restore the ban period, but instead keep it at 0.
> - make sure we idle the gpu fully before hanging it again (wait
>   subtest missted that).
> 
> With this gem_eio works now reliable even when I don't run the
> subtests individually.
> 
> Of course it's a bit fishy that the default ctx gets blamed for
> essentially doing nothing, but until that's figured out in upstream
> it's better to make the test work for now.

This used to be reliable. And just disabling all banning in the kernel
forever more is silly.

During igt_post_hang_ring:
1. we wait upon the hanging batch
 - this returns when hangcheck fires
2. reset the ban period to normal
 - this takes mutex_lock_interruptible and so must wait for the reset
   handler to run before it can make the change,
 - ergo the hanging batch never triggers a ban for itself.
 - (a subsequent nonsimulated GPU hang may trigger the ban though)

Nak.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list