[PATCH 2/2] tests/intel: Add xe_coredump test
Kamil Konieczny
kamil.konieczny at linux.intel.com
Mon Jan 15 15:47:49 UTC 2024
Hi Maarten,
On 2024-01-15 at 13:34:33 +0100, Maarten Lankhorst wrote:
> Hey,
>
> Den 2024-01-12 kl. 17:53, skrev Kamil Konieczny:
> > Hi Maarten,
> > On 2024-01-12 at 13:40:43 +0100, Maarten Lankhorst wrote:
> > > Add a simple test that forces a GPU hang and then reads the resulting
> > > devcoredump file. Map a single userptr and BO, and dump the contents of
> > > those.
> > >
> > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> > > ---
> > > tests/intel/xe_coredump.c | 188 ++++++++++++++++++++++++++++++++++++++
> > > tests/meson.build | 1 +
> > > 2 files changed, 189 insertions(+)
> > > create mode 100644 tests/intel/xe_coredump.c
> > >
> > > diff --git a/tests/intel/xe_coredump.c b/tests/intel/xe_coredump.c
> > > new file mode 100644
> > > index 000000000..9db79bb1d
> > > --- /dev/null
> > > +++ b/tests/intel/xe_coredump.c
> > > @@ -0,0 +1,188 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2023 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * TEST: Check devcoredump functionality
> > > + * Category: Software building block
> > > + * Sub-category: devcoredump
> > > + * Test category: functionality test
> > > + * Run type: BAT
> > > + * Description: Test devcoredump functionality and dumping work as intended.
> > Document also new subtest(s) here.
> >
> > > + */
> > > +
> > > +#include <string.h>
> > > +#include <unistd.h>
> > > +#include <fcntl.h>
> > > +#include <sys/stat.h>
> > > +#include <glob.h>
> > > +#include <wchar.h>
> > It should be sorted alphabetically but unistd.h is somewhat
> > special and should be first.
> Okay.
> >
> > > +
> > > +#include "igt.h"
> > > +#include "igt_device.h"
> > > +#include "igt_io.h"
> > > +#include "igt_syncobj.h"
> > > +#include "igt_sysfs.h"
> > > +
> > > +#include "intel_pat.h"
> > > +
> > > +#include "xe_drm.h"
> > > +#include "xe/xe_ioctl.h"
> > > +#include "xe/xe_query.h"
> > > +
> > > +static struct xe_device *xe;
> > > +static uint32_t batch_bo;
> > > +static uint32_t *batch;
> > > +static void *userptr;
> > > +static uint32_t vm;
> > > +
> > > +#define MAX_N_ENGINES 32
> > > +
> > > +/*
> > > + * Helper to read and clear devcore. We want to read it completely to ensure
> > > + * we catch any kernel side regressions like:
> > > + * https://gitlab.freedesktop.org/drm/msm/-/issues/20
> > > + */
> > > +
> > > +static void
> > > +read_and_clear_devcore(bool match)
> > --------------------------^^^^
> >
> > Could you get rid of bool param? It is better to write two
> > separate functions, one for clearing and one for dumping.
> This function does not really care what the contents of the coredump are,
> just that it can be read without a kernel panic.
Ok, but imho it should read it after clearing:
preparations:
clear all devcores
test:
casue hang
read devcore
What I was talking about was that preparation step, no need to
read it out in that step. For that you could write clear_devcore()
with only clearing it with the fast method and no igt_asserts() there.
If you prefere you could debug print which devcores you cleared
(e.g. which one were not empty before executing your test).
> > > +{
> > > + glob_t glob_buf = {0};
> > > + int ret, i;
> > > +
> > > + ret = glob("/sys/class/devcoredump/devcd*/data", GLOB_NOSORT, NULL, &glob_buf);
> > > + if ((ret == GLOB_NOMATCH) || !glob_buf.gl_pathc)
> > > + return;
> > > +
> > > + for (i = 0; i < glob_buf.gl_pathc; i++) {
> > > + char buf[0x1000];
> > > + int fd = open(glob_buf.gl_pathv[i], O_RDWR);
> > > +
> > > + if (fd == -1)
> > > + continue;
> > > +
> > > + /*
> > > + * We want to read the entire file but we can throw away the
> > > + * contents.. we just want to make sure that we exercise the
> > > + * kernel side codepaths hit when reading the devcore from
> > > + * sysfs
> > > + */
> > > + igt_debug("---- begin coredump ----\n");
> > > + do {
> > > + memset(buf, 0, sizeof(buf));
> > ----------- ^
> > It is better to do it once before loop.
>
> Yeah, except there may be multiple devcoredump capable devices, which is why
> we clear all at the beginning of the test.
Here I was talking about memset before reading into buf, it seems I should
look into igt_readn. Simplify it into:
do { /* read devcore and write it with igt_debug() */
ret = igt_readn(fd, buf, sizeof(buf) - 2);
if (ret <= 0)
break;
buf[ret + 1] = 0;
igt_debug("%s", buf);
} while(true);
Regards,
Kamil
>
> I should check the device path instead.
>
> >
> > > + ret = igt_readn(fd, buf, sizeof(buf) - 1);
> > > + igt_debug("%s", buf);
> > Consider limiting it to some reasonable size to not clobber CI.
> > If you want you could add option to test to write it all.
> >
> > > + } while (ret > 0);
> > > +
> > > + igt_debug("---- end coredump ----\n");
> > > +
> > > + /* Clear the devcore: */
> > > + igt_writen(fd, "1", 1);
> > Is it enough to write "1" to devcore to clear it? Or should it
> > be read out to get rid off the data?
> The readout is to ensure the actual dumping part works succesfully, or at
> least does not cause a kernel panic.
> > > +
> > > + close(fd);
> > > + match = false;
> > Better do it once, not in loop.
> >
> > > + }
> > > +
> > match = glob_buf.gl_pathc;
> > and later
> > igt_assert_f(match > 0, "No devcoredump found.\n");
> >
> > > + globfree(&glob_buf);
> > > + igt_assert_f(!match, "No devcoredump found.\n");
> > > +}
> > > +
> > > +/**
> > > + * SUBTEST: %s
> > -------------- ^
> > When using param you should later give its values but here it is
> > overcomplicating things, just name it:
> >
> > * SUBTEST: basic
> > > + * Description: Hang the GPU, and read out the resulting devcoredump.
> > -------------------------------^
> > No need for comma "," before "and".
> > Write in description what is the difference in second test.
> >
> > > + * Test category: functionality test
> > Put here description for second test:
> >
> > * SUBTEST: all-simultaneously
> >
> > with other fields filled.
> Yeah, wasn't sure how to add same description for 2 tests.
> > > + */
> > > +static void
> > > +do_hang_test(bool all)
> > > +{
> > > + uint32_t *ptr = batch;
> > > + uint64_t offset = xe->default_alignment - 4;
> > > + struct drm_xe_engine_class_instance *hwe;
> > > + uint32_t engines[MAX_N_ENGINES], syncobjs[MAX_N_ENGINES];
> > > + int i = 0;
> > > +
> > > + memset(batch, 0, xe->default_alignment);
> > > + *(ptr++) = MI_SEMAPHORE_WAIT | MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_GTE_SDD;
> > > + *(ptr++) = 1;
> > > + *(ptr++) = offset >> 32;
> > > + *(ptr++) = offset;
> > > + *(ptr++) = MI_BATCH_BUFFER_END;
> > > +
> > > + xe_for_each_engine(xe->fd, hwe) {
> > > + struct drm_xe_sync sync = {
> > > + .type = DRM_XE_SYNC_TYPE_SYNCOBJ,
> > > + .flags = DRM_XE_SYNC_FLAG_SIGNAL,
> > > + };
> > > +
> > > + sync.handle = syncobjs[i] = syncobj_create(xe->fd, 0);
> > > + engines[i] = xe_exec_queue_create(xe->fd, vm, hwe, 0);
> > > + xe_exec_sync(xe->fd, engines[i], 0, &sync, 1);
> > > + i++;
> > > + if (!all)
> > > + break;
> > > + }
> > > +
> > > + /* Ensure all syncobjs are in a failed state */
> > > + while (i--) {
> > > + syncobj_wait(xe->fd, &syncobjs[i], 1, INT64_MAX, 0, NULL);
> > > + syncobj_destroy(xe->fd, syncobjs[i]);
> > > + }
> > > +}
> > > +
> > > +igt_main
> > > +{
> > > + igt_fixture {
> > > + struct drm_xe_sync sync = {
> > > + .type = DRM_XE_SYNC_TYPE_SYNCOBJ,
> > > + .flags = DRM_XE_SYNC_FLAG_SIGNAL,
> > > + };
> > > + struct drm_xe_vm_bind_op bind_ops[2] = { };
> > > +
> > > + int fd = drm_open_driver_render(DRIVER_XE);
> > > + xe = xe_device_get(fd);
> > > +
> > > + /* Before test, ensure devcoredump is empty */
> > > + read_and_clear_devcore(false);
> > Better:
> > clear_devcore();
> >
> > > +
> > > + vm = xe_vm_create(fd, 0, 0);
> > > + batch_bo = xe_bo_create(fd, vm, xe->default_alignment, system_memory(fd), 0);
> > > + batch = xe_bo_map(xe->fd, batch_bo, xe->default_alignment);
> > > +
> > > + userptr = mmap(0, xe->default_alignment, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
> > > + wmemset(userptr, 0xf1234567, xe->default_alignment / sizeof(wchar_t));
> > > +
> > > + bind_ops[0].op = DRM_XE_VM_BIND_OP_MAP;
> > > + bind_ops[0].obj = batch_bo;
> > > + bind_ops[0].addr = 0;
> > > +
> > > + bind_ops[1].op = DRM_XE_VM_BIND_OP_MAP_USERPTR;
> > > + bind_ops[1].userptr = (size_t)userptr;
> > > + bind_ops[1].addr = 1ULL << 40ULL;
> > > +
> > > + bind_ops[0].flags = bind_ops[1].flags = DRM_XE_VM_BIND_FLAG_DUMPABLE;
> > > + bind_ops[0].range = bind_ops[1].range = xe->default_alignment;
> > > + bind_ops[0].pat_index = bind_ops[1].pat_index = intel_get_pat_idx_wb(fd);
> > > +
> > > + sync.handle = syncobj_create(fd, 0);
> > > + xe_vm_bind_array(fd, vm, 0, bind_ops, ARRAY_SIZE(bind_ops), &sync, 1);
> > > + syncobj_wait(fd, &sync.handle, 1, INT64_MAX, 0, NULL);
> > > + syncobj_destroy(fd, sync.handle);
> > > +
> > > + }
> > > +
> > > + igt_describe("Test that hw fault coredump readout works");
> > > + igt_subtest("basic") {
> > > + do_hang_test(false);
> > > +
> > > + read_and_clear_devcore(true);
> > Better:
> > read_and_clear_devcore();
> >
> > > + }
> > > +
> > > + igt_describe("Hang all engines simultaneously");
> > > + igt_subtest("all-simultaneously") {
> > > + do_hang_test(true);
> > > +
> > > + read_and_clear_devcore(true);
> > Same here:
> > read_and_clear_devcore();
>
> Thanks, I'll try to clear up the testcase.
>
> Cheers,
>
> Maarten
>
More information about the igt-dev
mailing list