[Intel-gfx] [PATCH i-g-t v15] tests: Add a test for device hot unplug
Petri Latvala
petri.latvala at intel.com
Thu Apr 16 07:09:44 UTC 2020
On Wed, Apr 15, 2020 at 03:15:15PM +0200, Janusz Krzysztofik wrote:
> From: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
>
> There is a test which verifies unloading of i915 driver module but no
> test exists that checks how a driver behaves when it gets unbound from
> a device or when the device gets unplugged. Implement such test using
> sysfs interface.
>
> Two minimalistic subtests - "unbind-rebind" and "unplug-rescan" -
> perform the named operations on a DRM device which is believed to be
> not in use. Another pair of subtests named "hotunbind-lateclose" and
> hotunplug-lateclose" do the same on a DRM device while keeping its file
> descriptor open and close it thereafter.
>
> v2: Run a subprocess with dummy_load instead of external command
> (Antonio).
> v3: Run dummy_load from the test process directly (Antonio).
> v4: Run dummy_load from inside subtests (Antonio).
> v5: Try to restore the device to a working state after each subtest
> (Petri, Daniel).
> v6: Run workload inside an igt helper subprocess so resources consumed
> by the workload are cleaned up automatically on workload subprocess
> crash, without affecting test results,
> - move the igt helper with workload back from subtests to initial
> fixture so workload crash also does not affect test results,
> - other cleanups suggested by Katarzyna and Chris.
> v7: No changes.
> v8: Move workload functions back from fixture to subtests,
> - register different actions and different workloads in respective
> tables and iterate over those tables while enumerating subtests,
> - introduce new subtest flavors by simply omitting module unload step,
> - instead of simply requesting bus rescan or not, introduce action
> specific device recovery helpers, required specifically with those
> new subtests not touching the module,
> - split workload functions in two parts, one spawning the workload,
> the other waiting for its completion,
> - for the new subtests not requiring module unload, run workload
> functions directly from the test process and use new workload
> completion wait functions in place of subprocess completion wait,
> - take more control over logging, longjumps and exit codes in
> workload subprocesses,
> - add some debug messages for easy progress watching,
> - move function API descriptions on top of respective typedefs.
> v9: All changes after Daniel's comments - thanks!
> - flatten the code, don't try to create a midlayer (Daniel),
> - provide minimal subtests that even don't keep device open (Daniel),
> - don't use driver unbind in more advanced subtests (Daniel),
> - provide subtests with different level of resources allocated
> during device unplug (Daniel),
> - provide subtests which check driver behavior after device hot
> unplug (Daniel).
> v10 Rename variables and function arguments to something that
> indicates they're file descriptors (Daniel),
> - introduce a data structure that contains various file descriptors
> and a helper function to set them all (Daniel),
> - fix strange indentation (Daniel),
> - limit scope to first three subtests as the initial set of tests to
> merge (Daniel).
> v11 Fix typos in some comments,
> - use SPDX license identifier,
> - include a per-patch changelog in the commit message (Daniel).
> v12 We don't use SPDX license identifiers nor GPL-2.0 in IGT (Petri),
> - avoid chipset, make sure we reopen the same device (Chris),
> - rename subtest "drm_open-hotunplug" to "hotunplug-lateclose",
> - add subtest "hotunbind-lateclose" (less affected by IOMMU issues),
> - move some redundant code to helpers,
> - reorder some helpers,
> - reword some messages and comments,
> - clean up headers.
> v13 Add test / subtest descriptions (patchwork).
> v14 Extract redundant device rescan and reopen code to a 'healthcheck'
> helper,
> - call igt_abort_on_f() on device reopen failure (Petri),
> - if any timeout set with igt_set_timeout() inside a subtest expires,
> call igt_abort_on_f() from a follow-up igt_fixture (Petri),
> - when running on a i915 device, require GEM and call
> igt_abort_on_f() if no usable GEM is detected on device reopen.
> v15 Add the test to CI blacklist (Martin).
>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Petri Latvala <petri.latvala at intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Katarzyna Dec <katarzyna.dec at intel.com>
> Cc: Martin Peres <martin.peres at linux.intel.com>
> Acked-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> tests/Makefile.sources | 1 +
> tests/core_hotunplug.c | 300 +++++++++++++++++++++++++++++++++++
> tests/intel-ci/blacklist.txt | 1 +
> tests/meson.build | 1 +
> 4 files changed, 303 insertions(+)
> create mode 100644 tests/core_hotunplug.c
>
> diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> index 4e44c98c2..32cbbf4f9 100644
> --- a/tests/Makefile.sources
> +++ b/tests/Makefile.sources
> @@ -18,6 +18,7 @@ TESTS_progs = \
> core_getclient \
> core_getstats \
> core_getversion \
> + core_hotunplug \
> core_setmaster \
> core_setmaster_vs_auth \
> debugfs_test \
> diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> new file mode 100644
> index 000000000..f9cfc8c3c
> --- /dev/null
> +++ b/tests/core_hotunplug.c
> @@ -0,0 +1,300 @@
> +/*
> + * Copyright © 2019 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +#include "igt.h"
> +#include "igt_device_scan.h"
> +#include "igt_kmod.h"
> +#include "igt_sysfs.h"
> +
> +IGT_TEST_DESCRIPTION("Examine behavior of a driver on device hot unplug");
> +
> +struct hotunplug {
> + struct {
> + int drm;
> + int sysfs_dev;
> + int sysfs_bus;
> + int sysfs_drv;
> + } fd;
> + char *dev_bus_addr;
> +};
> +
> +/* Helpers */
> +
> +static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
> +{
> + int len;
> +
> + igt_assert(buflen);
> +
> + priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
> + O_DIRECTORY);
> + igt_assert(priv->fd.sysfs_drv >= 0);
> +
> + len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
> + buf[len] = '\0';
> + priv->dev_bus_addr = strrchr(buf, '/');
> + igt_assert(priv->dev_bus_addr++);
> +
> + /* sysfs_dev no longer needed */
> + close(priv->fd.sysfs_dev);
> +}
> +
> +static void prepare(struct hotunplug *priv, char *buf, int buflen)
> +{
> + igt_debug("opening device\n");
> + priv->fd.drm = __drm_open_driver(DRIVER_ANY);
> + igt_assert(priv->fd.drm >= 0);
> +
> + priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
> + igt_assert(priv->fd.sysfs_dev >= 0);
> +
> + if (buf) {
> + prepare_for_unbind(priv, buf, buflen);
> + } else {
> + /* prepare for bus rescan */
> + priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
> + "device/subsystem", O_DIRECTORY);
> + igt_assert(priv->fd.sysfs_bus >= 0);
> + }
> +}
> +
> +static const char *failure;
> +
> +/* Unbind the driver from the device */
> +static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
> +{
> + failure = "Driver unbind timeout!";
> + igt_set_timeout(60, failure);
> + igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
> + igt_reset_timeout();
> + failure = NULL;
> +
> + /* don't close fd_sysfs_drv, it will be used for driver rebinding */
> +}
> +
> +/* Re-bind the driver to the device */
> +static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
> +{
> + failure = "Driver re-bind timeout!";
> + igt_set_timeout(60, failure);
> + igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
> + igt_reset_timeout();
> + failure = NULL;
> +
> + close(fd_sysfs_drv);
> +}
> +
> +/* Remove (virtually unplug) the device from its bus */
> +static void device_unplug(int fd_sysfs_dev)
> +{
> + failure = "Device unplug timeout!";
> + igt_set_timeout(60, failure);
> + igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
> + igt_reset_timeout();
> + failure = NULL;
> +
> + close(fd_sysfs_dev);
> +}
> +
> +/* Re-discover the device by rescanning its bus */
> +static void bus_rescan(int fd_sysfs_bus)
> +{
> + failure = "Bus rescan timeout!";
> + igt_set_timeout(60, failure);
> + igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
> + igt_reset_timeout();
> + failure = NULL;
> +
> + close(fd_sysfs_bus);
> +}
> +
> +static void healthcheck(void)
> +{
> + int fd_drm;
> +
> + /* device name may have changed, rebuild IGT device list */
> + igt_devices_scan(true);
> +
> + igt_debug("reopening the device\n");
> + fd_drm = __drm_open_driver(DRIVER_ANY);
> + igt_abort_on_f(fd_drm < 0, "Device reopen failure");
> +
> + if (is_i915_device(fd_drm)) {
> + failure = "GEM failure";
> + igt_require_gem(fd_drm);
> + failure = NULL;
> + }
> +
> + close(fd_drm);
> +}
> +
> +static void set_filter_from_device(int fd)
> +{
> + const char *filter_type = "sys:";
> + char filter[strlen(filter_type) + PATH_MAX + 1];
> + char *dst = stpcpy(filter, filter_type);
> + char path[PATH_MAX + 1];
> +
> + igt_assert(igt_sysfs_path(fd, path, PATH_MAX));
> + strncat(path, "/device", PATH_MAX - strlen(path));
> + igt_assert(realpath(path, dst));
> +
> + igt_device_filter_set(filter);
> +}
> +
> +/* Subtests */
> +
> +static void unbind_rebind(void)
> +{
> + struct hotunplug priv;
> + char buf[PATH_MAX];
> +
> + prepare(&priv, buf, sizeof(buf));
> +
> + igt_debug("closing the device\n");
> + close(priv.fd.drm);
> +
> + igt_debug("unbinding the driver from the device\n");
> + driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> + igt_debug("rebinding the driver to the device\n");
> + driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> + healthcheck();
> +}
> +
> +static void unplug_rescan(void)
> +{
> + struct hotunplug priv;
> +
> + prepare(&priv, NULL, 0);
> +
> + igt_debug("closing the device\n");
> + close(priv.fd.drm);
> +
> + igt_debug("unplugging the device\n");
> + device_unplug(priv.fd.sysfs_dev);
> +
> + igt_debug("recovering the device\n");
> + bus_rescan(priv.fd.sysfs_bus);
> +
> + healthcheck();
> +}
> +
> +static void hotunbind_lateclose(void)
> +{
> + struct hotunplug priv;
> + char buf[PATH_MAX];
> +
> + prepare(&priv, buf, sizeof(buf));
> +
> + igt_debug("hot unbinding the driver from the device\n");
> + driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> + igt_debug("rebinding the driver to the device\n");
> + driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> + igt_debug("late closing the unbound device instance\n");
> + close(priv.fd.drm);
> +
> + healthcheck();
> +}
> +
> +static void hotunplug_lateclose(void)
> +{
> + struct hotunplug priv;
> +
> + prepare(&priv, NULL, 0);
> +
> + igt_debug("hot unplugging the device\n");
> + device_unplug(priv.fd.sysfs_dev);
> +
> + igt_debug("recovering the device\n");
> + bus_rescan(priv.fd.sysfs_bus);
> +
> + igt_debug("late closing the removed device instance\n");
> + close(priv.fd.drm);
> +
> + healthcheck();
> +}
> +
> +/* Main */
> +
> +igt_main
> +{
> + igt_fixture {
> + int fd_drm;
> +
> + /**
> + * As subtests must be able to close examined devices
> + * completely, don't use drm_open_driver() as it keeps
> + * a device file descriptor open for exit handler use.
> + */
> + fd_drm = __drm_open_driver(DRIVER_ANY);
> + igt_assert(fd_drm >= 0);
> +
> + if (is_i915_device(fd_drm))
> + igt_require_gem(fd_drm);
> +
> + /* Make sure subtests always reopen the same device */
> + set_filter_from_device(fd_drm);
> +
> + close(fd_drm);
> + }
> +
> + igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
> + igt_subtest("unbind-rebind")
> + unbind_rebind();
> +
> + igt_fixture
> + igt_abort_on_f(failure, "%s\n", failure);
> +
> + igt_describe("Check if a device believed to be closed can be cleanly unplugged");
> + igt_subtest("unplug-rescan")
> + unplug_rescan();
> +
> + igt_fixture
> + igt_abort_on_f(failure, "%s\n", failure);
> +
> + igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
> + igt_subtest("hotunbind-lateclose")
> + hotunbind_lateclose();
> +
> + igt_fixture
> + igt_abort_on_f(failure, "%s\n", failure);
> +
> + igt_describe("Check if a still open device can be cleanly unplugged, then released");
> + igt_subtest("hotunplug-lateclose")
> + hotunplug_lateclose();
> +
> + igt_fixture
> + igt_abort_on_f(failure, "%s\n", failure);
> +}
> diff --git a/tests/intel-ci/blacklist.txt b/tests/intel-ci/blacklist.txt
> index ee7045f03..201f4b1b4 100644
> --- a/tests/intel-ci/blacklist.txt
> +++ b/tests/intel-ci/blacklist.txt
> @@ -117,3 +117,4 @@ igt at .*@.*pipe-f($|-.*)
> # Since 5.7-rc1, this test has produced tens of megabytes of kernel
> # logs.
> igt at perf_pmu@cpu-hotplug
> +igt at core_hotunplug@.*
That makes it read like it's also "producing tens of megabytes of
kernel logs". An empty line, and a separate comment with something
like
# Currently fails and leaves the machine in a very bad state, and
# causes coverage loss for other tests.
With that,
Acked-by: Petri Latvala <petri.latvala at intel.com>
More information about the Intel-gfx
mailing list