[PATCH i-g-t] tests/intel/xe_wedged: Introduce a new test for Xe device wedged state
Lucas De Marchi
lucas.demarchi at intel.com
Wed Mar 13 20:07:44 UTC 2024
On Wed, Mar 13, 2024 at 03:55:28PM -0400, Rodrigo Vivi wrote:
>Let's inject a gt_reset failure that will put Xe device in the
>new wedged state, then we confirm the IOCTL is blocked and we
>reload the driver to get back to a clean state for other test
>execution, since wedged state in Xe is a final state that can only
>be cleared with a module reload.
>
>This new test case is entirely based on xe_uevent provided by
>Himal.
/me confused... I don't see any uevent handling here.
>
>Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>---
> tests/intel/xe_wedged.c | 91 +++++++++++++++++++++++++++++++++++++++++
> tests/meson.build | 1 +
> 2 files changed, 92 insertions(+)
> create mode 100644 tests/intel/xe_wedged.c
>
>diff --git a/tests/intel/xe_wedged.c b/tests/intel/xe_wedged.c
>new file mode 100644
>index 000000000..f767e2511
>--- /dev/null
>+++ b/tests/intel/xe_wedged.c
>@@ -0,0 +1,91 @@
>+// SPDX-License-Identifier: MIT
>+/*
>+ * Copyright © 2024 Intel Corporation
>+ */
>+
>+/**
>+ * TEST: cause fake gt reset failure which put Xe device in wedged state
>+ * Category: Software building block
>+ * Sub-category: driver
>+ * Functionality: wedged
>+ * Test category: functionality test
>+ */
>+
>+#include "igt.h"
>+#include "igt_kmod.h"
>+
>+#include "xe/xe_ioctl.h"
>+
>+static void force_wedged(int fd)
>+{
>+ igt_debugfs_write(fd, "fail_gt_reset/probability", "100");
>+ igt_debugfs_write(fd, "fail_gt_reset/times", "2");
>+
>+ xe_force_gt_reset(fd, 0);
humn... do we have to check the writes above did anything? I also don't
see the kernel side, but if it just resets normally, the test would
still pass afaics.
>+ sleep(1);
>+}
>+
>+static int reload_xe(int fd)
>+{
>+ int error;
>+
>+ drm_close_driver(fd);
>+ igt_xe_driver_unload();
what if we are running on e.g. MTL with a DG2 and want to debug one of
them? Rather than re-loading the module and possibly causing unrelated
issues (if e.g. module removal from the other card crashes), why not
just unbind the module from the card under test?
i.e. the equivalent in C of:
rebind() {
pci_slot=$1
echo -n "0000:$pci_slot" > /sys/bus/pci/drivers/$driver/unbind
echo -n "0000:$pci_slot" > /sys/bus/pci/drivers/$driver/bind
}
Lucas De Marchi
>+
>+ error = igt_xe_driver_load(NULL);
>+
>+ igt_assert_eq(error, 0);
>+
>+ /* driver is ready, check if it's bound */
>+ fd = __drm_open_driver(DRIVER_XE);
>+ igt_fail_on_f(fd < 0, "Cannot open the xe DRM driver while reloading xe after wedged\n");
>+ return fd;
>+}
>+
>+static int simple_ioctl(int fd)
>+{
>+ int ret;
>+
>+ struct drm_xe_vm_create create = {
>+ .extensions = 0,
>+ .flags = 0,
>+ };
>+
>+ ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create);
>+
>+ if (ret == 0)
>+ xe_vm_destroy(fd, create.vm_id);
>+
>+ return ret;
>+}
>+
>+/**
>+ * SUBTEST: basic-wedged
>+ * Description: Force Xe device wedged after injecting a failure in GT reset
>+ */
>+igt_main
>+{
>+ int fd;
>+
>+ igt_fixture {
>+ fd = drm_open_driver(DRIVER_XE);
>+ igt_require(igt_debugfs_exists(fd, "fail_gt_reset/probability",
>+ O_RDWR));
>+ }
>+
>+ igt_subtest("basic-wedged") {
>+ igt_assert_eq(simple_ioctl(fd), 0);
>+ force_wedged(fd);
>+ igt_assert_neq(simple_ioctl(fd), 0);
>+ fd = reload_xe(fd);
>+ igt_assert_eq(simple_ioctl(fd), 0);
>+ }
>+
>+ igt_fixture {
>+ if (igt_debugfs_exists(fd, "fail_gt_reset/probability", O_RDWR)) {
>+ igt_debugfs_write(fd, "fail_gt_reset/probability", "0");
>+ igt_debugfs_write(fd, "fail_gt_reset/times", "1");
>+ }
>+ drm_close_driver(fd);
>+ }
>+}
>diff --git a/tests/meson.build b/tests/meson.build
>index a856510fc..e590d4348 100644
>--- a/tests/meson.build
>+++ b/tests/meson.build
>@@ -312,6 +312,7 @@ intel_xe_progs = [
> 'xe_render_copy',
> 'xe_vm',
> 'xe_waitfence',
>+ 'xe_wedged',
> 'xe_spin_batch',
> 'xe_sysfs_defaults',
> 'xe_sysfs_scheduler',
>--
>2.44.0
>
More information about the igt-dev
mailing list