[Intel-gfx] [PATCH igt 3/9] igt/drv_hangman: Inject a true hang

Chris Wilson chris at chris-wilson.co.uk
Wed Dec 16 06:03:02 PST 2015


On Wed, Dec 16, 2015 at 01:51:46PM +0000, Chris Wilson wrote:
> On Wed, Dec 16, 2015 at 03:12:57PM +0200, Ville Syrjälä wrote:
> > On Wed, Dec 16, 2015 at 10:37:55AM +0000, Chris Wilson wrote:
> > > On Wed, Dec 16, 2015 at 10:02:27AM +0100, Daniel Vetter wrote:
> > > > On Sat, Dec 12, 2015 at 08:02:49PM +0000, Chris Wilson wrote:
> > > > > Wean drv_hangman off the atrocious stop_rings and use a real GPU hang
> > > > > instead.
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > 
> > > > Doesn't this kill pre-gen6? Or at least anything where we don't have
> > > > proper hang recovery ... Lack of that is why I've done the original
> > > > stop_rings fun.
> > > 
> > > Originally, igt_hang_ring required gen >= 5, but since that Ville has
> > > been working hard on getting reset support working for gen3 and gen4,
> > > now we query the kernel as to whether it can reset the device. So by
> > > switching over we lose testing of simulated hangs and recovery code (or 
> > > KMS handling during wedged) for gen2.
> > > 
> > > It is a loss in test coverage, but the benefit is that we can remove the
> > > hang injection code from the kernel. And that is a tradeoff I am willing
> > > to make.
> > 
> > One day I will get around to trying D3 as a reset mechanism for gen2 ;)
> > Until then, losing the gen2 test coverage seems fairly reasonable.
> > I suppose the only thing it's really testing on gen2 is making sure we
> > don't lock up or oops when the gpu hangs.
> > 
> > So what might be nice is a way to force the hang tests to run even 
> > when gpu reset isn't supported, just to make sure the system doesn't
> > die completely. Obviously it's going to be slow to run such tests due
> > to needing a reboot between every test, but it would be enough to do
> > an occasional spot check to see if there's been a regression.
> 
> Would "export IGT_HANG_WITHOUT_RESET=1" be enough?


commit 8e5113502f1a0feb8f5996d7b44f5021aa24a35f
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Fri Dec 11 21:24:21 2015 +0000

    lib: Always double check igt_require_hang_ring() on use
    
    If we move the igt_require() into the hang injector, this makes simple
    test cases even more convenient. More complex test cases can always do
    their own precursory check before settting up the test.
    
    However, this does embed the assumption that the first context we are
    called from is safe (i.e no i915.enable_hangcheck/i915.reset
    interferrence).
    
    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 688ea5e..5ae1717 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -50,17 +50,21 @@
 
 static bool has_gpu_reset(int fd)
 {
-	struct drm_i915_getparam gp;
-	int val = 0;
-
-	memset(&gp, 0, sizeof(gp));
-	gp.param = 35; /* HAS_GPU_RESET */
-	gp.value = &val;
-
-	if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp))
-		return intel_gen(intel_get_drm_devid(fd)) >= 5;
-
-	return val > 0;
+	static int once = -1;
+	if (once < 0) {
+		struct drm_i915_getparam gp;
+		int val = 0;
+
+		memset(&gp, 0, sizeof(gp));
+		gp.param = 35; /* HAS_GPU_RESET */
+		gp.value = &val;
+
+		if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp))
+			once = intel_gen(intel_get_drm_devid(fd)) >= 5;
+		else
+			once = val > 0;
+	}
+	return once;
 }
 
 /**
@@ -71,11 +75,29 @@ static bool has_gpu_reset(int fd)
  * Convenience helper to check whether advanced hang injection is supported by
  * the kernel. Uses igt_skip to automatically skip the test/subtest if this
  * isn't the case.
+ *
+ * Note that we can't simply just call this from igt_hang_ring since some
+ * tests want to exercise gpu wedging behavior. For which we intentionally
+ * disable gpu reset support, but still want to inject a hang, see for example
+ * tests/gem_eio.c Instead, we expect that the first invocation of
+ * igt_require_hand_ring be from a vanilla context and use the has_gpu_reset()
+ * determined then for all later instances. This allows us the convenience
+ * of double checking when injecting hangs, whilst pushing the complexity
+ * to the tests that are deliberating trying to break the box.
+ *
+ * This function is also controlled by the environment variables:
+ *
+ * IGT_HANG_WITHOUT_RESET (boolean) - if true, allow the hang even if the
+ * kernel does not support GPU recovery. The machine will be wedged afterwards
+ * (and so require a reboot between testing), but it does allow limited testing
+ * to be done under hang injection.
  */
 void igt_require_hang_ring(int fd, int ring)
 {
+	gem_require_ring(fd, ring);
 	gem_context_require_ban_period(fd);
-	igt_require(has_gpu_reset(fd));
+	if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
+		igt_require(has_gpu_reset(fd));
 }
 
 /**
@@ -100,6 +122,8 @@ igt_hang_ring_t igt_hang_ring(int fd, int ring)
 	unsigned ban;
 	unsigned len;
 
+	igt_require_hang_ring(fd, ring);
+
 	param.context = 0;
 	param.size = 0;
 	param.param = LOCAL_CONTEXT_PARAM_BAN_PERIOD;
diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index e348f26..22c694b 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1241,6 +1241,7 @@ void gem_require_caching(int fd)
 void gem_require_ring(int fd, int ring_id)
 {
 	switch (ring_id) {
+	case I915_EXEC_DEFAULT:
 	case I915_EXEC_RENDER:
 		return;
 	case I915_EXEC_BLT:
@@ -1255,6 +1256,7 @@ void gem_require_ring(int fd, int ring_id)
 		return;
 #endif
 	default:
+		igt_warn("Invalid ring: %d\n", ring_id);
 		igt_assert(0);
 		return;
 	}


-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list