[Intel-gfx] [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space

Thu Feb 25 11:32:38 UTC 2016

On Thu, Feb 25, 2016 at 11:12:06AM +0000, Daniele Ceraolo Spurio wrote:
> 
> 
> On 25/02/16 10:41, Chris Wilson wrote:
> >On Thu, Feb 25, 2016 at 10:32:11AM +0000, daniele.ceraolospurio at intel.com wrote:
> >>+/* This test covers the case where we end up in an uninitialised area of the
> >>+ * ppgtt at an offset greater than the one where the last buffer is mapped. This
> >>+ * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
> >>+ * massively bigger compared to the 32b case and it takes a lot more time to
> >>+ * wrap, so the acthd can potentially keep increasing for a long time
> >>+ */
> >>+#define NSEC_PER_SEC	1000000000L
> >>+static void ppgtt_walking(void)
> >>+{
> >>+	int fd;
> >>+	int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
> >This needs a note that this has to be greater than ~5*hangcheck.
> >
> >>+	struct drm_i915_gem_execbuffer2 execbuf;
> >>+	struct drm_i915_gem_exec_object2 gem_exec;
> >>+	uint32_t handle;
> >>+	uint32_t batch[4];
> >>+
> >>+	fd = drm_open_driver(DRIVER_INTEL);
> >>+	igt_require(gem_gtt_type(fd) > 2);
> >Nope, just full-ppgtt is required (and provides a sensible hangcheck
> >test if !48bit as well).
> >
> >Note this does require that the hangcheck is enabled, so igt_require().
> >
> >>+
> >>+	/* the batch will be mapped to an offset < 4GB because the flag to allow
> >>+	 * 48b offsets is not specified, so jump to address 0x00000001 00000000
> >>+	 */
> >>+	batch[0] = MI_BATCH_BUFFER_START | 1;
> >>+	batch[1] = 0;
> >>+	batch[2] = 1;
> >>+	batch[3] = MI_BATCH_BUFFER_END;
> >The vm is entirely empty. Just submit an unterminated (empty) batch, and
> >it will walk from 0 to 1<<48bit and around and around and around and
> >around...
> 
> I chose to jump instead of just leaving the batch unterminated to
> cover the (rare) case where the rest of the allocated 4k of the
> batch contain some random values, which could cause a hang and thus
> falsely pass the test.

That would be a huge kernel bug. Freshly allocated buffers have to be
zero to avoid information leaks. I hope you are confusing allocating
from the userspace buffer cache with a fresh kernel allocation...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre