[PATCH v5] tests/intel/xe_exec_capture: Add xe_exec_capture test

Thu Nov 21 17:42:20 UTC 2024

On 2024-11-20 6:32 p.m., Zhanjun Dong wrote:
> Submit cmds to the GPU that result in a GuC engine reset and check that
> devcoredump register dump is generated, by the GuC, and includes the
> full register range.
> 
> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> Cc: Peter Senna Tschudin <peter.senna at intel.com>
> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> 
> Changes from prior revs:
>   v5:-  Detect devcoredump matches the testing engine
>         Engine will run with random cid
>   v4:-  Support runs on multiple GPU
>         Load all devcoredump content to buffer
>         Alloc line buffer dynamic vs static global memory
>         Changed to igt_assert_f to provide more info if failed
>   v3:-  Remove call to bash and awk
>         Add regular express parse
>         Detect devcoredump through card index
>         Add devcoredump removal check
>   v2:-  Fix CI.build error
>         Add multiple GPU card support
> ---
>   tests/intel/xe_exec_capture.c | 462 ++++++++++++++++++++++++++++++++++
>   tests/meson.build             |   1 +
>   2 files changed, 463 insertions(+)
>   create mode 100644 tests/intel/xe_exec_capture.c
> 
> diff --git a/tests/intel/xe_exec_capture.c b/tests/intel/xe_exec_capture.c
> new file mode 100644
> index 000000000..33b92bb71
...
> +
> +static void
> +test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci, int n_exec_queues, int n_execs,
> +		 unsigned int flags, u64 addr)
> +{
> +	u32 vm;
> +	struct drm_xe_sync sync[2] = {
> +		{ .type = DRM_XE_SYNC_TYPE_SYNCOBJ, .flags = DRM_XE_SYNC_FLAG_SIGNAL, },
> +		{ .type = DRM_XE_SYNC_TYPE_SYNCOBJ, .flags = DRM_XE_SYNC_FLAG_SIGNAL, },
> +	};
> +	struct drm_xe_exec exec = {
> +		.num_batch_buffer = 1,
> +		.num_syncs = 2,
> +		.syncs = to_user_pointer(sync),
> +	};
> +	u32 exec_queues[MAX_N_EXECQUEUES];
> +	u32 syncobjs[MAX_N_EXECQUEUES];
> +	size_t bo_size;
> +	u32 bo = 0;
> +	struct {
> +		struct xe_spin spin;
> +		u32 batch[BATCH_DW_COUNT];
> +		u64 pad;
> +		u32 data;
> +	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = false };
> +	int i, b;
> +
> +	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
> +
> +	vm = xe_vm_create(fd, 0, 0);
> +	bo_size = sizeof(*data) * n_execs;
> +	bo_size = xe_bb_size(fd, bo_size);
> +
> +	bo = xe_bo_create(fd, vm, bo_size,
> +			  vram_if_possible(fd, eci->gt_id),
> +			  DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM);
> +	data = xe_bo_map(fd, bo, bo_size);
> +
> +	for (i = 0; i < n_exec_queues; i++) {
> +		exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
> +		syncobjs[i] = syncobj_create(fd, 0);
> +	};
> +
> +	sync[0].handle = syncobj_create(fd, 0);
> +	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
> +
> +	for (i = 0; i < n_execs; i++) {
> +		u64 base_addr = addr;
> +		u64 batch_offset = (char *)&data[i].batch - (char *)data;
> +		u64 batch_addr = base_addr + batch_offset;
> +		u64 spin_offset = (char *)&data[i].spin - (char *)data;
> +		u64 sdi_offset = (char *)&data[i].data - (char *)data;
> +		u64 sdi_addr = base_addr + sdi_offset;
> +		u64 exec_addr;
> +		int e = i % n_exec_queues;
> +
> +		if (!i) {
> +			spin_opts.addr = base_addr + spin_offset;
> +			xe_spin_init(&data[i].spin, &spin_opts);
> +			exec_addr = spin_opts.addr;
> +		} else {
> +			b = 0;
> +			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> +			data[i].batch[b++] = sdi_addr;
> +			data[i].batch[b++] = sdi_addr >> 32;
> +			data[i].batch[b++] = 0xc0ffee;
> +			data[i].batch[b++] = MI_BATCH_BUFFER_END;
> +			igt_assert(b <= ARRAY_SIZE(data[i].batch));
> +
> +			exec_addr = batch_addr;
> +		}
> +
> +		sync[0].flags &= ~DRM_XE_SYNC_FLAG_SIGNAL;
> +		sync[1].flags |= DRM_XE_SYNC_FLAG_SIGNAL;
> +		sync[1].handle = syncobjs[e];
> +
> +		exec.exec_queue_id = exec_queues[e];
> +		exec.address = exec_addr;
> +		if (e != i)
> +			syncobj_reset(fd, &syncobjs[e], 1);
> +		xe_exec(fd, &exec);
> +	}
> +
> +	for (i = 0; i < n_exec_queues && n_execs; i++)
> +		igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0,
> +					NULL));
> +	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
> +
> +	sync[0].flags |= DRM_XE_SYNC_FLAG_SIGNAL;
> +	xe_vm_unbind_async(fd, vm, 0, 0, addr, bo_size, sync, 1);
> +	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
> +
> +	syncobj_destroy(fd, sync[0].handle);
> +	for (i = 0; i < n_exec_queues; i++) {
> +		syncobj_destroy(fd, syncobjs[i]);
> +		xe_exec_queue_destroy(fd, exec_queues[i]);
> +	}
> +
> +	munmap(data, bo_size);
> +	gem_close(fd, bo);
> +	xe_vm_destroy(fd, vm);
> +}
Alan raised concerns about this function:
1. Put it into igt lib as this function appears in xe_exec_reset and 
xe_exec_thread.
2. This test runs with fixed value for parameter: n_exec_queues, n_execs 
and flags, may be the code could be optimized.

For #1, the function body in other 2 files is different, for this test, 
I also add addr as parameter, which is another different as well, so 
further review might be needed in the future when put into igt lib. That 
could be the next step.
For #2, if it is possible to move to igt lib, is it worth to do the 
optimization? Shall we get the test landing 1st?
As 2 concerns not addressed and looks not so clear, I will leave it as 
is for now.

Regards,
Zhanjun Dong
...