[Intel-gfx] [PATCH] drm/i915: add interface to simulate gpu hangs

Ben Widawsky ben at bwidawsk.net
Tue Dec 6 00:20:59 CET 2011


On Fri, Dec 02, 2011 at 11:21:49PM +0100, Daniel Vetter wrote:
> gpu reset is a very important piece of our infrastructure.
> Unfortunately we only really it test by actually hanging the gpu,
> which often has bad side-effects for the entire system. And the gpu
> hang handling code is one of the rather complicated pieces of code we
> have, consisting of
> - hang detection
> - error capture
> - actual gpu reset
> - reset of all the gem bookkeeping
> - reinitialition of the entire gpu
> 
> This patch adds a debugfs to selectively stopping rings by ceasing to
> update the hw tail pointer, which will result in the gpu no longer
> updating it's head pointer and eventually to the hangcheck firing.
> This way we can exercise the gpu hang code under controlled conditions
> without a dying gpu taking down the entire systems.
> 
> Patch motivated by me forgetting to properly reinitialize ppgtt after
> a gpu reset.
> 
> Usage:
> 
> echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
> 
> echo 0xffffffff > i915_ring_stop # stops all, future-proof version
> 
> then run whatever testload is desired. i915_ring_stop automatically
> resets after a gpu hang is detected to avoid hanging the gpu to fast
> and declaring it wedged.
> 
> v2: Incorporate feedback from Chris Wilson.
> 
> v3: Add the missing cleanup.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
Acked-by: Ben Widawsky <ben at bwidawsk.net>



More information about the Intel-gfx mailing list