[Intel-gfx] [PATCH] igt/gem_request_retire: Provoke context destruction with active VMAs

Thu Nov 19 13:02:46 PST 2015

On Thu, Nov 19, 2015 at 05:24:03PM +0000, Tvrtko Ursulin wrote:
> >>+		while (i % 4)
> >>+			batch[i++] = MI_NOOP;
> 
> Is this ok? I arrived at it by experimentation, kernel was
> complaining that relocations are not 4-byte aligned without it.

batch is u32, so all is good. What the kernel would be complaining about
would be that the batch must be quadword aligned in length I guess.

> Although now I'm thinking offsets are in bytes and my calculation
> below is completely wrong.. but it did not crash the GPU. Hm.

> 
> >>+
> >>+		if (blit_len == 0)
> >>+			blit_len = i;
> >>+	}
> >>+
> >>+	batch[i++] = MI_BATCH_BUFFER_END;
> >>+	batch[i++] = MI_NOOP;
> >>+
> >>+	handle = gem_create(fd, sizeof(batch));
> >>+	gem_write(fd, handle, 0, batch, sizeof(batch));
> >>+
> >>+	for (j = 0; j < copies; j++) {
> >>+		unsigned int r0 = j * 2;
> >>+		unsigned int r1 = r0 + 1;
> >>+
> >>+		reloc[r0].target_handle = dst;
> >>+		reloc[r0].delta = 0;
> >>+		reloc[r0].offset = j * blit_len + 4 * sizeof(uint32_t);
> >>+		reloc[r0].presumed_offset = 0;
> >>+		reloc[r0].read_domains = I915_GEM_DOMAIN_RENDER;
> >>+		reloc[r0].write_domain = I915_GEM_DOMAIN_RENDER;
> >>+
> >>+		reloc[r1].target_handle = src;
> >>+		reloc[r1].delta = 0;
> >>+		reloc[r1].offset = j * blit_len + 7 * sizeof(uint32_t);
> >>+		if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> >>+			reloc[r1].offset += sizeof(uint32_t);
> >>+		reloc[r1].presumed_offset = 0;
> >>+		reloc[r1].read_domains = I915_GEM_DOMAIN_RENDER;
> >>+		reloc[r1].write_domain = 0;
> >
> >Just fill the relocation array as you generate the batch.
> 
> To save one loop? :)

It was more to do with getting the offsets correct. The blit_len looks
very fragile and confusing.
> 
> >>+	}
> >>+
> >>+	memset(obj, 0, sizeof(obj));
> >>+	exec.buffer_count = 0;
> >>+	obj[exec.buffer_count++].handle = dst;
> >>+	if (src != dst)
> >>+		obj[exec.buffer_count++].handle = src;
> >>+	obj[exec.buffer_count].handle = handle;
> >>+	obj[exec.buffer_count].relocation_count = 2 * copies;
> >>+	obj[exec.buffer_count].relocs_ptr = (uintptr_t)reloc;
> >>+	exec.buffer_count++;
> >>+	exec.buffers_ptr = (uintptr_t)obj;
> >>+
> >
> >memset(&exec, 0, sizeof(exec));
> >instead of:
> >>+	exec.batch_start_offset = 0;
> >>+	exec.batch_len = i * sizeof(uint32_t);
> >>+	exec.DR1 = exec.DR4 = 0;
> >>+	exec.num_cliprects = 0;
> >>+	exec.cliprects_ptr = 0;
> >>+	exec.flags = I915_EXEC_BLT;
> >>+	i915_execbuffer2_set_context_id(exec, ctx_id);
> >>+	exec.rsvd2 = 0;
> >>+
> >>+	ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &exec);
> >>+	igt_assert_eq(ret, 0);
> >gem_execbuf(fd, &exec);
> 
> Okay, it was copy&paste. :)

I haven't killed them all yet.

> >For the purposes of your test, you don't even need to execute anything,
> >just associate the dst handle with an empty batch on the render ring.
> >You don't even need a fake relocation since the test already requires a
> >fairly recent kernel.
> 
> I thought so but couldn't get it to work. Kept getting EINVAL which
> is a pig to find the real reason with execbuf. I'll try some more.
> 
> >
> >>+
> >>+	return handle;
> >>+}
> >>+
> >>+/*
> >>+ * A single bo is operated from batchbuffers submitted from two contexts and on
> >>+ * different rings.
> >>+ * One execbuf finishes way ahead of the other at which point the respective
> >>+ * context is destroyed.
> >>+ */
> >>+static void
> >>+test_retire_vma_not_inactive(int fd)
> >>+{
> >>+	uint32_t ctx_id;
> >>+	uint32_t src, dst, dst2;
> >>+	uint32_t blit_bb, store_bb;
> >>+
> >>+	igt_require(HAS_BLT_RING(intel_get_drm_devid(fd)));
> >>+
> >>+	ctx_id = gem_context_create(fd);
> >>+	igt_assert(ctx_id);
> >>+
> >>+	/* Create some bos batch buffers will operate on. */
> >>+	src = gem_create(fd, BO_SIZE);
> >>+	igt_assert(src);
> >>+	dst = gem_create(fd, BO_SIZE);
> >>+	igt_assert(dst);
> >>+	dst2 = gem_create(fd, 4096);
> >>+	igt_assert(dst2);
> >>+
> >>+	/* Submit a long running batch. */
> >>+	blit_bb = blit(fd, dst, src, 0);
> >>+	/* Submit a quick batch referencing the same object. */
> >>+	store_bb = store(fd, dst2, src, ctx_id);
> >>+	/* Wait for the quick batch to complete. */
> >>+	gem_sync(fd, store_bb);
> >>+	gem_sync(fd, dst2);
> >>+	gem_close(fd, store_bb);
> >
> >>+	/* Do it again to ensure in kernel retirement is triggered. */
> >Circumstantial!
> 
> Yes I don't think it is actually required but a leftover from development.
> 
> Key is that retire work handler runs before the context destruction,
> since execlist retire is only runs from there, and it is keeping
> references to requests which have been processed long ago.
> 
> So I was also thinking - should we call the execlist retire requests
> more often in general? Execbuf path calls the normal ring retirement
> so maybe add it there?

No. execbuf will not be calling ring retire itself in future. That is to
handle an igt starvation issue, but it has interesting implications wrt
overhead. Similarly, there are technical reasons why execlists retire
has to be called infrequently atm (I guess it was just lucky bad
placement during development), call execlists retire too often and you
end up having to repin the contexts every batch.

The bug is not peculiar to execlists. You can hit it elsewhere as the
last ref is dropped in i915_gem_object_retire__read before the object is
not marked as inactive.

> >>+	store_bb = store(fd, dst2, src, ctx_id);
> >>+	gem_sync(fd, store_bb);
> >>+	gem_sync(fd, dst2);
> >>+	gem_close(fd, store_bb);
> >>+	gem_close(fd, dst2);
> >>+
> >>+	/* Now destroy the context in which the quick batch was submitted. */
> >>+	gem_context_destroy(fd, ctx_id);
> >
> >Feels very fragile. Isn't the bug something like
> 
> Well my blit runs for 6-7 seconds on a fast SKL so only fragility is
> that retire work handler manages to get scheduled during that
> window. So I think it is fine.
> 
> >batch = big_render_copy(ctx);
> >context_close(ctx);
> >gem_sync(batch);

That actually has to be an explicit gem_set_domain() not gem_sync().

> I don't think so. It needs two contexts so that object can be active
> on two rings and we can destroy one context with another one still
> active.

That's not the same bug I see I guess.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre