[PATCH i-g-t v4 1/1] tests/intel/xe_exec_capture: Add xe_exec_capture test

Wed Nov 13 07:48:42 UTC 2024

On Tue, 2024-10-22 at 09:33 -0700, Zhanjun Dong wrote:
> Test with GuC reset, check if devcoredump register dump is within the
alan: nit: more clarification? -> "Submit cmds to the GPU that result in a GuC engine reset and check that devcoredump
register dump is generated, by the GuC, and includes the full register range"?
> range.
> 
> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> ---
>  tests/intel/xe_exec_capture.c | 365 ++++++++++++++++++++++++++++++++++
>  tests/meson.build             |   1 +
>  2 files changed, 366 insertions(+)
>  create mode 100644 tests/intel/xe_exec_capture.c
> 
> diff --git a/tests/intel/xe_exec_capture.c b/tests/intel/xe_exec_capture.c
> new file mode 100644
> index 000000000..3a8fa68f2
> --- /dev/null
> +++ b/tests/intel/xe_exec_capture.c
> @@ -0,0 +1,365 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +/**
> + * TEST: Basic tests for GuC based register capture
> + * Category: Core
> + * Mega feature: General Core features
> + * Sub-category: CMD submission
> + * Functionality: Debug
> + * Test category: functionality test
> + */
> +
> +#include <ctype.h>
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <regex.h>
alan:nit: alphabetical? (I'm not sure if this is a hard rule for std headers)
> +
> +#include "igt.h"
> +#include "igt_device.h"
> +#include "lib/igt_syncobj.h"
> +#include "lib/intel_reg.h"
> +#include "xe_drm.h"
> +#include "xe/xe_ioctl.h"
> +#include "xe/xe_query.h"
> +#include "xe/xe_spin.h"
> +
> +#define MAX_N_EXECQUEUES               16
> +#define MAX_INSTANCE                   9
> +#define GT_RESET                       (0x1 << 0)
> +#define CLOSE_FD                       (0x1 << 1)
> +#define CLOSE_EXEC_QUEUES              (0x1 << 2)
> +#define VIRTUAL                                (0x1 << 3)
> +#define PARALLEL                       (0x1 << 4)
> +#define CAT_ERROR                      (0x1 << 5)
> +
> +#define BASE_ADDRESS                   0x1a0000
> +/* Batch buffer element count, in number of dwords(uint32_t) */
> +#define BATCH_DW_COUNT                 16
> +
> +#define MAX_PATH_NAME_LEN              512
> +#define MAX_LINES                      4096
> +/* Keep line content up to this length */
> +#define MAX_LINE_LEN                   1024
> +#define MAIN_BUF_SIZE                  (MAX_LINES * MAX_LINE_LEN * sizeof(char))
> +/* Buffer size to read a line */
> +#define LINE_BUF_SIZE                  (64 * 1024)
> +
alan: the above set of definitions are being used for the sole purpose of 
trying to get all of the devcoredump file into a buffer that you can then go through
an find some specific keywords using strstr in get_coredump_item. However, instead
of using all these definitions and creating alloc_lines_buffer, load_all and get_coredump_item,
why not reuse igt_sysfs_get function? is there a reason we are not using that?

Alternatively, i am wondering if we can consider the top part of the devcoredump layout
is persistent and so maybe we dont even need to load so much of the devcoredump?
i.e. if we dont get any variable length printouts or buffer object dumps
before the guc error capture then we could simply read the minimal buffer size needed
to get the initial keywords of the guc error capture dump.

> +#define DUMP_PATH                      "/sys/class/drm/card%d/device/devcoredump/data"
> +#define START_TAG                      "**** Job ****"
> +#define REGEX_KEY_VALUE_PAIR           "^[ \t]*([^:]+):[ \t]*([^ \t]+)[ \t]*$"
> +#define REGEX_KEY_INDEX                        1
> +#define REGEX_VALUE_INDEX              2
> +#define REGEX_KEY_VALUE_GROUP_COUNT    3
> +
> +static void
> +test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
> +                int n_exec_queues, int n_execs, unsigned int flags)
> +
alan: just a question: why are you calling it "test_legacy_mode"? 
i know you are re-using this function from other tests,
but I'm curious on the history of this name.

alan: future follow up: As a side note, we should create a separate task
item to consolidate the multiple instances of this function currently
being used by other tests and re-purpose it as an xe library helper.
This will sure require additional in-out params more input validations
However, since we need this test to ensure we get guc-error-capture tested
in CI a little urgently, create a separete internal task for that.
> {
> +       uint32_t vm;
> +       const uint64_t addr = BASE_ADDRESS;
> +       struct drm_xe_sync sync[2] = {
> +               { .type = DRM_XE_SYNC_TYPE_SYNCOBJ, .flags = DRM_XE_SYNC_FLAG_SIGNAL, },
> +               { .type = DRM_XE_SYNC_TYPE_SYNCOBJ, .flags = DRM_XE_SYNC_FLAG_SIGNAL, },
> +       };
> +       struct drm_xe_exec exec = {
> +               .num_batch_buffer = 1,
> +               .num_syncs = 2,
> +               .syncs = to_user_pointer(sync),
> +       };
> +       uint32_t exec_queues[MAX_N_EXECQUEUES];
> +       uint32_t syncobjs[MAX_N_EXECQUEUES];
> +       size_t bo_size;
> +       uint32_t bo = 0;
> +       struct {
> +               struct xe_spin spin;
> +               uint32_t batch[BATCH_DW_COUNT];
> +               uint64_t pad;
> +               uint32_t data;
> +       } *data;
> +       struct xe_spin_opts spin_opts = { .preempt = false };
> +       int i, b;
> +
> +       igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
> +
> +       vm = xe_vm_create(fd, 0, 0);
> +       bo_size = sizeof(*data) * n_execs;
> +       bo_size = xe_bb_size(fd, bo_size);
> +
> +       bo = xe_bo_create(fd, vm, bo_size,
> +                         vram_if_possible(fd, eci->gt_id),
> +                         DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM);
> +       data = xe_bo_map(fd, bo, bo_size);
> +
> +       for (i = 0; i < n_exec_queues; i++) {
> +               exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
> +               syncobjs[i] = syncobj_create(fd, 0);
> +       };
> +
> +       sync[0].handle = syncobj_create(fd, 0);
> +       xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
> +
alan: nit so you can ignore, but it would be really nice if the flow is
summarized in a concise comment. Perhaps better to do that when we get
to doing the conslidation into a helper in future.

> +       for (i = 0; i < n_execs; i++) {
> +               uint64_t base_addr = addr;
alan: i see that for every single batch on all engines (as per the caller that loops
calling this "test_legacy_mode" per engine, we always use the same batch address.

However, with the same batch address across all runs of this loop, it means that
we have no way to catch the following:

1. if we encounter a case where the devcoredump retrieved contained a guc-error-capture
that didnt correlate with the batch we were expecting to get reset on (i.e. the first
batch of a given current needs to be one getting reset, but we end up with another one),
then we need to flag that as a warning or failure. (I'm assuming the execution of batches
on a given queue is guaranteed to be in order).

2. if we encounter a case where the devcoredump retrieved contained a guc-error-capture
of some other engine such as the prior engine that was being tested that we should also  
flag that as a warning or failure.

That said, i think the batch base address really should be never consistent for any
two engines and never consistent for any two batches. so maybe... ->

	const uint64_t addr = BASE_ADDRESS + (engine_id * (0x100));

	... then in the loop...
	int e = i % n_exec_queues;
	uint64_t base_addr = addr + e * 0x8;

this way, when we get a devcoredump, we should ensure that the batch address
matches the engine_id we just tested and ensure its last nibble is always 0x8?

> +               uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
> +               uint64_t batch_addr = base_addr + batch_offset;
> +               uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> +               uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
> +               uint64_t sdi_addr = base_addr + sdi_offset;
> +               uint64_t exec_addr;
> +               int e = i % n_exec_queues;
> +
> +               if (!i) {
> +                       spin_opts.addr = base_addr + spin_offset;
> +                       xe_spin_init(&data[i].spin, &spin_opts);
> +                       exec_addr = spin_opts.addr;
> +               } else {
> +                       b = 0;
> +                       data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> +                       data[i].batch[b++] = sdi_addr;
> +                       data[i].batch[b++] = sdi_addr >> 32;
> +                       data[i].batch[b++] = 0xc0ffee;
> +                       data[i].batch[b++] = MI_BATCH_BUFFER_END;
> +                       igt_assert(b <= ARRAY_SIZE(data[i].batch));
> +
> +                       exec_addr = batch_addr;
> +               }
> +
> +               sync[0].flags &= ~DRM_XE_SYNC_FLAG_SIGNAL;
> +               sync[1].flags |= DRM_XE_SYNC_FLAG_SIGNAL;
> +               sync[1].handle = syncobjs[e];
> +
> +               exec.exec_queue_id = exec_queues[e];
> +               exec.address = exec_addr;
> +               if (e != i)
> +                       syncobj_reset(fd, &syncobjs[e], 1);
> +               xe_exec(fd, &exec);
> +       }
alan: so this code is new to me - so to help me understand, can you explain
if i got this right in terms of what above loop and below sync-waits are doing?:
1. so e send n_execs number of batches across n_exec_queues number of queues (on the same engine)
2. then, below, we wait on each one of those batches to start. 
3. And finally we wait for the vm to unbind?

However, i notice from the caller that you are only doing count batch of one and queue count of one.
That said, i am kinda wondering that if the batch buffer is doing a spinner, and if this igt
test coule potentially be running alone without any other workloads, then how would a reset be
triggered? i thought if we only have 1 workload running with nothing else being queued, then
the GuC wont have a reason to preempt the work. I also thought we dont have heartbeat in xe...
am i mistaken? how do we guarantee that an engine reset occurs?

lets connect offline as i am a bit lost in some of these codes.
> +
> +       for (i = 0; i < n_exec_queues && n_execs; i++)
> +               igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0,
> +                                       NULL));
> +       igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
> +
> +       sync[0].flags |= DRM_XE_SYNC_FLAG_SIGNAL;
> +       xe_vm_unbind_async(fd, vm, 0, 0, addr, bo_size, sync, 1);
> +       igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
> +
> +       syncobj_destroy(fd, sync[0].handle);
> +       for (i = 0; i < n_exec_queues; i++) {
> +               syncobj_destroy(fd, syncobjs[i]);
> +               xe_exec_queue_destroy(fd, exec_queues[i]);
> +       }
> +
> +       munmap(data, bo_size);
> +       gem_close(fd, bo);
> +       xe_vm_destroy(fd, vm);
> +}
> +
> +static char **alloc_lines_buffer(void)
> +{
> +       int i;
> +       char **lines = (char **)malloc(MAX_LINES * sizeof(char *));
> +       char *main_buf =  (char *)malloc(MAIN_BUF_SIZE);

> +
> +       igt_assert_f(lines, "Out of memory.\n");
> +       igt_assert_f(main_buf, "Out of memory.\n");
> +
> +       /* set string array pointers */
> +       for (i = 0; i < MAX_LINES; i++)
> +               lines[i] = main_buf + i * MAX_LINE_LEN;
> +
> +       return lines;
> +}
> +
> +static char *get_devcoredump_path(int card_id, char *buf)
> +{
> +       sprintf(buf, DUMP_PATH, card_id);
> +       return buf;
> +}
> +
> +static void load_all(FILE *fd, char **lines, char *buf)
> +{
> +       int i = 0;
> +
> +       memset(lines[0], 0, MAIN_BUF_SIZE);
> +       while (!feof(fd) && i < MAX_LINES) {
> +               /*
> +                * Devcoredump might have long lines, load up to
> +                * LINE_BUF_SIZE for a single line, buffer reused
> +                */
> +               fgets(buf, LINE_BUF_SIZE, fd);
> +               /* Only save up to MAX_LINE_LEN to buffer */
> +               strncpy(lines[i++], buf, MAX_LINE_LEN);
> +       }
> +}
> +
> +static bool access_devcoredump(char *path, char **lines, char *line_buf)
> +{
> +       FILE *fd = fopen(path, "r");
> +
> +       if (!fd)
> +               return false;
> +
> +       igt_debug("Devcoredump found: %s\n", path);
> +
> +       /* Clear memory before load file */
> +       if (lines)
> +               load_all(fd, lines, line_buf);
> +
> +       fclose(fd);
> +       return true;
> +}
> +
> +static bool rm_devcoredump(char *path)
> +{
> +       int fd = open(path, O_WRONLY);
> +
> +       if (fd != -1) {
> +               igt_debug("Clearing devcoredump.\n");
> +               write(fd, "0", 1);
> +               close(fd);
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
> +static char *get_coredump_item(regex_t *regex, char **lines, const char *tag)
> +{
> +       int i;
> +       regmatch_t match[REGEX_KEY_VALUE_GROUP_COUNT];
> +
> +       for (i = 0; i < MAX_LINES; i++) {
> +               char *line = lines[i];
> +
> +               /* Skip lines without tag */
> +               if (!strstr(line, tag))
> +                       continue;
> +
> +               if (regexec(regex, line, REGEX_KEY_VALUE_GROUP_COUNT, match, 0) == 0) {
> +                       char *key, *value;
> +
> +                       key = &line[match[REGEX_KEY_INDEX].rm_so];
> +                       line[match[REGEX_KEY_INDEX].rm_eo] = '\0';
> +                       value = &line[match[REGEX_VALUE_INDEX].rm_so];
> +                       line[match[REGEX_VALUE_INDEX].rm_eo] = '\0';
> +
> +                       if (strcmp(tag, key) == 0)
> +                               return value;
> +                       /* if key != tag,  keep searching and loop to next line */
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static void check_item_u64(regex_t *regex, char **lines, const char *tag, uint64_t addr_lo,
> +                          uint64_t addr_hi)
> +{
> +       uint64_t result;
> +       char *output;
> +
> +       igt_assert_f(output = get_coredump_item(regex, lines, tag), "Target not found:%s", tag);
> +       result = strtoul(output, NULL, 16);
> +       igt_debug("Compare %s: %s vs [0x%lX-0x%lX]\n", tag, output, addr_lo, addr_hi);
> +       igt_assert_f((addr_lo <= result) && (result <= addr_hi),
> +                    "value %lX out of range[0x%lX-0x%lX]\n", result, addr_lo, addr_hi);
> +}
> +
> +static void check_item_str(regex_t *regex, char **lines, const char *tag, const char *target)
> +{
> +       char *output;
> +
> +       igt_assert_f(output = get_coredump_item(regex, lines, tag), "Target not found:%s", tag);
> +       igt_debug("Compare %s: %s vs %s\n", tag, output, target);
> +       igt_assert_f(!strcmp(output, target), "Expected value:%s, received:%s\n", target, output);
> +}
> +
> +static void check_capture_out(regex_t *regex, char **lines)
> +{
> +       check_item_str(regex, lines, "Capture_source", "GuC");
> +       check_item_u64(regex, lines, "ACTHD", BASE_ADDRESS,
> +                      BASE_ADDRESS + BATCH_DW_COUNT * sizeof(uint32_t));
> +       check_item_u64(regex, lines, "RING_BBADDR", BASE_ADDRESS,
> +                      BASE_ADDRESS + BATCH_DW_COUNT * sizeof(uint32_t));
alan: why are we not checking the engine instance - ensure the drm_queue's engine (the one we just
tested to hang) matches what  the devcoredump dumped?

> +}
> +
> +/**
> + * SUBTEST: reset
> + * Description: Reset GuC, check devcoredump output values
> + */
> +static void test_card(int fd)
alan: nit: maybe rename from "test_card" to some a bit more descriptive like
"trigger_hwe_resets_and_verify_devcoredump"?
> +{
> +       struct drm_xe_engine_class_instance *hwe;
> +       regex_t regex;
> +       char **lines;
> +       char *single_line_buf =  (char *)malloc(LINE_BUF_SIZE);
> +       char path[MAX_PATH_NAME_LEN];
> +
> +       igt_assert_f(single_line_buf, "Out of memory.\n");
> +
> +       regcomp(&regex, REGEX_KEY_VALUE_PAIR, REG_EXTENDED | REG_NEWLINE);
> +       get_devcoredump_path(igt_device_get_card_index(fd), path);
> +       lines = alloc_lines_buffer();
> +
> +       /* clear old devcoredump, if any */
> +       rm_devcoredump(path);
> +
> +       xe_for_each_engine(fd, hwe) {
> +               igt_debug("Running on engine class: %x instance: %x\n", hwe->engine_class,
> +                         hwe->engine_instance);
> +               test_legacy_mode(fd, hwe, 1, 1, 0);
alan: nit: maybe rename test_legacy_mode to trigger_reset?
actually i wonder if this entire subtest can reuse the 
> +
> +               /* assert devcoredump created */
> +               igt_assert_f(access_devcoredump(path, lines, single_line_buf),
> +                            "Devcoredump not exist, errno=%d.\n", errno);
> +
> +               check_capture_out(&regex, lines);
> +
> +               /* clear devcoredump */
> +               rm_devcoredump(path);
> +               sleep(1);
> +               /* Assert devcoredump removed */
> +               igt_assert_f(!access_devcoredump(path, NULL, NULL), "Devcoredump not removed\n");
> +       }
> +       /* Free lines buffer */
> +       free(lines);
> +       free(single_line_buf);
> +       regfree(&regex);
> +}
> +
> +igt_main
> +{
> +       int xe;
> +
> +       igt_fixture
> +               xe = drm_open_driver(DRIVER_XE);
> +
> +       igt_subtest("reset") {
> +               int gpu_count = drm_prepare_filtered_multigpu(DRIVER_XE);
> +
> +               igt_require(xe > 0);
> +               if (gpu_count >= 2) {
> +                       igt_multi_fork(child, gpu_count) {
alan: if we want to fork, then we ought to ensure that any igt_debug/warn/error/print functions
that report important info (during the batch buffer execution and retrieval + verification of
the devcoredumps) contain a way to differentiate which fd we are refering to.
Please ignore this comment if in fact igt_debug already has that built-in.
> +                               int gpu_fd;
> +
> +                               gpu_fd = drm_open_filtered_card(child);
> +                               igt_assert_f(gpu_fd > 0, "cannot open gpu-%d, errno=%d\n", child,
> +                                            errno);
> +                               igt_assert(is_xe_device(gpu_fd));
> +
> +                               test_card(gpu_fd);
> +                               drm_close_driver(gpu_fd);
> +                       }
> +                       igt_waitchildren();
> +               } else {
> +                       test_card(xe);
> +               }
> +       }
> +
> +       igt_fixture
> +               drm_close_driver(xe);
> +}
> diff --git a/tests/meson.build b/tests/meson.build
> index 2d8cb87d5..b724a7c6d 100644
> --- a/tests/meson.build
> +++ b/tests/meson.build
> @@ -284,6 +284,7 @@ intel_xe_progs = [
>         'xe_exec_atomic',
>         'xe_exec_balancer',
>         'xe_exec_basic',
> +       'xe_exec_capture',
>         'xe_exec_compute_mode',
>         'xe_exec_fault_mode',
>         'xe_exec_mix_modes',
> -- 
> 2.34.1
>