[Intel-gfx] [PATCH i-g-t v16] tests: Add a test for device hot unplug
Janusz Krzysztofik
janusz.krzysztofik at linux.intel.com
Thu Apr 16 08:16:34 UTC 2020
From: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
There is a test which verifies unloading of i915 driver module but no
test exists that checks how a driver behaves when it gets unbound from
a device or when the device gets unplugged. Implement such test using
sysfs interface.
Two minimalistic subtests - "unbind-rebind" and "unplug-rescan" -
perform the named operations on a DRM device which is believed to be
not in use. Another pair of subtests named "hotunbind-lateclose" and
hotunplug-lateclose" do the same on a DRM device while keeping its file
descriptor open and close it thereafter.
v2: Run a subprocess with dummy_load instead of external command
(Antonio).
v3: Run dummy_load from the test process directly (Antonio).
v4: Run dummy_load from inside subtests (Antonio).
v5: Try to restore the device to a working state after each subtest
(Petri, Daniel).
v6: Run workload inside an igt helper subprocess so resources consumed
by the workload are cleaned up automatically on workload subprocess
crash, without affecting test results,
- move the igt helper with workload back from subtests to initial
fixture so workload crash also does not affect test results,
- other cleanups suggested by Katarzyna and Chris.
v7: No changes.
v8: Move workload functions back from fixture to subtests,
- register different actions and different workloads in respective
tables and iterate over those tables while enumerating subtests,
- introduce new subtest flavors by simply omitting module unload step,
- instead of simply requesting bus rescan or not, introduce action
specific device recovery helpers, required specifically with those
new subtests not touching the module,
- split workload functions in two parts, one spawning the workload,
the other waiting for its completion,
- for the new subtests not requiring module unload, run workload
functions directly from the test process and use new workload
completion wait functions in place of subprocess completion wait,
- take more control over logging, longjumps and exit codes in
workload subprocesses,
- add some debug messages for easy progress watching,
- move function API descriptions on top of respective typedefs.
v9: All changes after Daniel's comments - thanks!
- flatten the code, don't try to create a midlayer (Daniel),
- provide minimal subtests that even don't keep device open (Daniel),
- don't use driver unbind in more advanced subtests (Daniel),
- provide subtests with different level of resources allocated
during device unplug (Daniel),
- provide subtests which check driver behavior after device hot
unplug (Daniel).
v10 Rename variables and function arguments to something that
indicates they're file descriptors (Daniel),
- introduce a data structure that contains various file descriptors
and a helper function to set them all (Daniel),
- fix strange indentation (Daniel),
- limit scope to first three subtests as the initial set of tests to
merge (Daniel).
v11 Fix typos in some comments,
- use SPDX license identifier,
- include a per-patch changelog in the commit message (Daniel).
v12 We don't use SPDX license identifiers nor GPL-2.0 in IGT (Petri),
- avoid chipset, make sure we reopen the same device (Chris),
- rename subtest "drm_open-hotunplug" to "hotunplug-lateclose",
- add subtest "hotunbind-lateclose" (less affected by IOMMU issues),
- move some redundant code to helpers,
- reorder some helpers,
- reword some messages and comments,
- clean up headers.
v13 Add test / subtest descriptions (patchwork).
v14 Extract redundant device rescan and reopen code to a 'healthcheck'
helper,
- call igt_abort_on_f() on device reopen failure (Petri),
- if any timeout set with igt_set_timeout() inside a subtest expires,
call igt_abort_on_f() from a follow-up igt_fixture (Petri),
- when running on a i915 device, require GEM and call
igt_abort_on_f() if no usable GEM is detected on device reopen.
v15 Add the test to CI blacklist (Martin).
v16 Separate blacklist entry with a descriptive comment (Petri).
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
Cc: Antonio Argenziano <antonio.argenziano at intel.com>
Cc: Daniel Vetter <daniel at ffwll.ch>
Cc: Katarzyna Dec <katarzyna.dec at intel.com>
Cc: Martin Peres <martin.peres at linux.intel.com>
Acked-by: Chris Wilson <chris at chris-wilson.co.uk>
Acked-by: Petri Latvala <petri.latvala at intel.com>
---
tests/Makefile.sources | 1 +
tests/core_hotunplug.c | 300 +++++++++++++++++++++++++++++++++++
tests/intel-ci/blacklist.txt | 4 +
tests/meson.build | 1 +
4 files changed, 306 insertions(+)
create mode 100644 tests/core_hotunplug.c
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 4e44c98c2..32cbbf4f9 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -18,6 +18,7 @@ TESTS_progs = \
core_getclient \
core_getstats \
core_getversion \
+ core_hotunplug \
core_setmaster \
core_setmaster_vs_auth \
debugfs_test \
diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
new file mode 100644
index 000000000..f9cfc8c3c
--- /dev/null
+++ b/tests/core_hotunplug.c
@@ -0,0 +1,300 @@
+/*
+ * Copyright © 2019 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <fcntl.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "igt.h"
+#include "igt_device_scan.h"
+#include "igt_kmod.h"
+#include "igt_sysfs.h"
+
+IGT_TEST_DESCRIPTION("Examine behavior of a driver on device hot unplug");
+
+struct hotunplug {
+ struct {
+ int drm;
+ int sysfs_dev;
+ int sysfs_bus;
+ int sysfs_drv;
+ } fd;
+ char *dev_bus_addr;
+};
+
+/* Helpers */
+
+static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
+{
+ int len;
+
+ igt_assert(buflen);
+
+ priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
+ O_DIRECTORY);
+ igt_assert(priv->fd.sysfs_drv >= 0);
+
+ len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
+ buf[len] = '\0';
+ priv->dev_bus_addr = strrchr(buf, '/');
+ igt_assert(priv->dev_bus_addr++);
+
+ /* sysfs_dev no longer needed */
+ close(priv->fd.sysfs_dev);
+}
+
+static void prepare(struct hotunplug *priv, char *buf, int buflen)
+{
+ igt_debug("opening device\n");
+ priv->fd.drm = __drm_open_driver(DRIVER_ANY);
+ igt_assert(priv->fd.drm >= 0);
+
+ priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
+ igt_assert(priv->fd.sysfs_dev >= 0);
+
+ if (buf) {
+ prepare_for_unbind(priv, buf, buflen);
+ } else {
+ /* prepare for bus rescan */
+ priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
+ "device/subsystem", O_DIRECTORY);
+ igt_assert(priv->fd.sysfs_bus >= 0);
+ }
+}
+
+static const char *failure;
+
+/* Unbind the driver from the device */
+static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
+{
+ failure = "Driver unbind timeout!";
+ igt_set_timeout(60, failure);
+ igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
+ igt_reset_timeout();
+ failure = NULL;
+
+ /* don't close fd_sysfs_drv, it will be used for driver rebinding */
+}
+
+/* Re-bind the driver to the device */
+static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
+{
+ failure = "Driver re-bind timeout!";
+ igt_set_timeout(60, failure);
+ igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
+ igt_reset_timeout();
+ failure = NULL;
+
+ close(fd_sysfs_drv);
+}
+
+/* Remove (virtually unplug) the device from its bus */
+static void device_unplug(int fd_sysfs_dev)
+{
+ failure = "Device unplug timeout!";
+ igt_set_timeout(60, failure);
+ igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
+ igt_reset_timeout();
+ failure = NULL;
+
+ close(fd_sysfs_dev);
+}
+
+/* Re-discover the device by rescanning its bus */
+static void bus_rescan(int fd_sysfs_bus)
+{
+ failure = "Bus rescan timeout!";
+ igt_set_timeout(60, failure);
+ igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
+ igt_reset_timeout();
+ failure = NULL;
+
+ close(fd_sysfs_bus);
+}
+
+static void healthcheck(void)
+{
+ int fd_drm;
+
+ /* device name may have changed, rebuild IGT device list */
+ igt_devices_scan(true);
+
+ igt_debug("reopening the device\n");
+ fd_drm = __drm_open_driver(DRIVER_ANY);
+ igt_abort_on_f(fd_drm < 0, "Device reopen failure");
+
+ if (is_i915_device(fd_drm)) {
+ failure = "GEM failure";
+ igt_require_gem(fd_drm);
+ failure = NULL;
+ }
+
+ close(fd_drm);
+}
+
+static void set_filter_from_device(int fd)
+{
+ const char *filter_type = "sys:";
+ char filter[strlen(filter_type) + PATH_MAX + 1];
+ char *dst = stpcpy(filter, filter_type);
+ char path[PATH_MAX + 1];
+
+ igt_assert(igt_sysfs_path(fd, path, PATH_MAX));
+ strncat(path, "/device", PATH_MAX - strlen(path));
+ igt_assert(realpath(path, dst));
+
+ igt_device_filter_set(filter);
+}
+
+/* Subtests */
+
+static void unbind_rebind(void)
+{
+ struct hotunplug priv;
+ char buf[PATH_MAX];
+
+ prepare(&priv, buf, sizeof(buf));
+
+ igt_debug("closing the device\n");
+ close(priv.fd.drm);
+
+ igt_debug("unbinding the driver from the device\n");
+ driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+
+ igt_debug("rebinding the driver to the device\n");
+ driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+
+ healthcheck();
+}
+
+static void unplug_rescan(void)
+{
+ struct hotunplug priv;
+
+ prepare(&priv, NULL, 0);
+
+ igt_debug("closing the device\n");
+ close(priv.fd.drm);
+
+ igt_debug("unplugging the device\n");
+ device_unplug(priv.fd.sysfs_dev);
+
+ igt_debug("recovering the device\n");
+ bus_rescan(priv.fd.sysfs_bus);
+
+ healthcheck();
+}
+
+static void hotunbind_lateclose(void)
+{
+ struct hotunplug priv;
+ char buf[PATH_MAX];
+
+ prepare(&priv, buf, sizeof(buf));
+
+ igt_debug("hot unbinding the driver from the device\n");
+ driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+
+ igt_debug("rebinding the driver to the device\n");
+ driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
+
+ igt_debug("late closing the unbound device instance\n");
+ close(priv.fd.drm);
+
+ healthcheck();
+}
+
+static void hotunplug_lateclose(void)
+{
+ struct hotunplug priv;
+
+ prepare(&priv, NULL, 0);
+
+ igt_debug("hot unplugging the device\n");
+ device_unplug(priv.fd.sysfs_dev);
+
+ igt_debug("recovering the device\n");
+ bus_rescan(priv.fd.sysfs_bus);
+
+ igt_debug("late closing the removed device instance\n");
+ close(priv.fd.drm);
+
+ healthcheck();
+}
+
+/* Main */
+
+igt_main
+{
+ igt_fixture {
+ int fd_drm;
+
+ /**
+ * As subtests must be able to close examined devices
+ * completely, don't use drm_open_driver() as it keeps
+ * a device file descriptor open for exit handler use.
+ */
+ fd_drm = __drm_open_driver(DRIVER_ANY);
+ igt_assert(fd_drm >= 0);
+
+ if (is_i915_device(fd_drm))
+ igt_require_gem(fd_drm);
+
+ /* Make sure subtests always reopen the same device */
+ set_filter_from_device(fd_drm);
+
+ close(fd_drm);
+ }
+
+ igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
+ igt_subtest("unbind-rebind")
+ unbind_rebind();
+
+ igt_fixture
+ igt_abort_on_f(failure, "%s\n", failure);
+
+ igt_describe("Check if a device believed to be closed can be cleanly unplugged");
+ igt_subtest("unplug-rescan")
+ unplug_rescan();
+
+ igt_fixture
+ igt_abort_on_f(failure, "%s\n", failure);
+
+ igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
+ igt_subtest("hotunbind-lateclose")
+ hotunbind_lateclose();
+
+ igt_fixture
+ igt_abort_on_f(failure, "%s\n", failure);
+
+ igt_describe("Check if a still open device can be cleanly unplugged, then released");
+ igt_subtest("hotunplug-lateclose")
+ hotunplug_lateclose();
+
+ igt_fixture
+ igt_abort_on_f(failure, "%s\n", failure);
+}
diff --git a/tests/intel-ci/blacklist.txt b/tests/intel-ci/blacklist.txt
index ee7045f03..08333afff 100644
--- a/tests/intel-ci/blacklist.txt
+++ b/tests/intel-ci/blacklist.txt
@@ -117,3 +117,7 @@ igt at .*@.*pipe-f($|-.*)
# Since 5.7-rc1, this test has produced tens of megabytes of kernel
# logs.
igt at perf_pmu@cpu-hotplug
+
+# Currently fails and leaves the machine in a very bad state, and
+# causes coverage loss for other tests.
+igt at core_hotunplug@.*
diff --git a/tests/meson.build b/tests/meson.build
index e882f4dcd..0bdcfbe4c 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -3,6 +3,7 @@ test_progs = [
'core_getclient',
'core_getstats',
'core_getversion',
+ 'core_hotunplug',
'core_setmaster',
'core_setmaster_vs_auth',
'debugfs_test',
--
2.21.1
More information about the Intel-gfx
mailing list