[Intel-gfx] [PATCH i-g-t v15] tests: Add a test for device hot unplug

Petri Latvala petri.latvala at intel.com
Thu Apr 16 07:09:44 UTC 2020


On Wed, Apr 15, 2020 at 03:15:15PM +0200, Janusz Krzysztofik wrote:
> From: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> 
> There is a test which verifies unloading of i915 driver module but no
> test exists that checks how a driver behaves when it gets unbound from
> a device or when the device gets unplugged.  Implement such test using
> sysfs interface.
> 
> Two minimalistic subtests - "unbind-rebind" and "unplug-rescan" -
> perform the named operations on a DRM device which is believed to be
> not in use.  Another pair of subtests named "hotunbind-lateclose" and
> hotunplug-lateclose" do the same on a DRM device while keeping its file
> descriptor open and close it thereafter.
> 
> v2: Run a subprocess with dummy_load instead of external command
>     (Antonio).
> v3: Run dummy_load from the test process directly (Antonio).
> v4: Run dummy_load from inside subtests (Antonio).
> v5: Try to restore the device to a working state after each subtest
>     (Petri, Daniel).
> v6: Run workload inside an igt helper subprocess so resources consumed
>     by the workload are cleaned up automatically on workload subprocess
>     crash, without affecting test results,
>   - move the igt helper with workload back from subtests to initial
>     fixture so workload crash also does not affect test results,
>   - other cleanups suggested by Katarzyna and Chris.
> v7: No changes.
> v8: Move workload functions back from fixture to subtests,
>   - register different actions and different workloads in respective
>     tables and iterate over those tables while enumerating subtests,
>   - introduce new subtest flavors by simply omitting module unload step,
>   - instead of simply requesting bus rescan or not, introduce action
>     specific device recovery helpers, required specifically with those
>     new subtests not touching the module,
>   - split workload functions in two parts, one spawning the workload,
>     the other waiting for its completion,
>   - for the new subtests not requiring module unload, run workload
>     functions directly from the test process and use new workload
>     completion wait functions in place of subprocess completion wait,
>   - take more control over logging, longjumps and exit codes in
>     workload subprocesses,
>   - add some debug messages for easy progress watching,
>   - move function API descriptions on top of respective typedefs.
> v9: All changes after Daniel's comments - thanks!
>   - flatten the code, don't try to create a midlayer (Daniel),
>   - provide minimal subtests that even don't keep device open (Daniel),
>   - don't use driver unbind in more advanced subtests (Daniel),
>   - provide subtests with different level of resources allocated
>     during device unplug (Daniel),
>   - provide subtests which check driver behavior after device hot
>     unplug (Daniel).
> v10 Rename variables and function arguments to something that
>     indicates they're file descriptors (Daniel),
>   - introduce a data structure that contains various file descriptors
>     and a helper function to set them all (Daniel),
>   - fix strange indentation (Daniel),
>   - limit scope to first three subtests as the initial set of tests to
>     merge (Daniel).
> v11 Fix typos in some comments,
>   - use SPDX license identifier,
>   - include a per-patch changelog in the commit message (Daniel).
> v12 We don't use SPDX license identifiers nor GPL-2.0 in IGT (Petri),
>   - avoid chipset, make sure we reopen the same device (Chris),
>   - rename subtest "drm_open-hotunplug" to "hotunplug-lateclose",
>   - add subtest "hotunbind-lateclose" (less affected by IOMMU issues),
>   - move some redundant code to helpers,
>   - reorder some helpers,
>   - reword some messages and comments,
>   - clean up headers.
> v13 Add test / subtest descriptions (patchwork).
> v14 Extract redundant device rescan and reopen code to a 'healthcheck'
>     helper,
>   - call igt_abort_on_f() on device reopen failure (Petri),
>   - if any timeout set with igt_set_timeout() inside a subtest expires,
>     call igt_abort_on_f() from a follow-up igt_fixture (Petri),
>   - when running on a i915 device, require GEM and call
>     igt_abort_on_f() if no usable GEM is detected on device reopen.
> v15 Add the test to CI blacklist (Martin).
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Petri Latvala <petri.latvala at intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Katarzyna Dec <katarzyna.dec at intel.com>
> Cc: Martin Peres <martin.peres at linux.intel.com>
> Acked-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>  tests/Makefile.sources       |   1 +
>  tests/core_hotunplug.c       | 300 +++++++++++++++++++++++++++++++++++
>  tests/intel-ci/blacklist.txt |   1 +
>  tests/meson.build            |   1 +
>  4 files changed, 303 insertions(+)
>  create mode 100644 tests/core_hotunplug.c
> 
> diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> index 4e44c98c2..32cbbf4f9 100644
> --- a/tests/Makefile.sources
> +++ b/tests/Makefile.sources
> @@ -18,6 +18,7 @@ TESTS_progs = \
>  	core_getclient \
>  	core_getstats \
>  	core_getversion \
> +	core_hotunplug \
>  	core_setmaster \
>  	core_setmaster_vs_auth \
>  	debugfs_test \
> diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> new file mode 100644
> index 000000000..f9cfc8c3c
> --- /dev/null
> +++ b/tests/core_hotunplug.c
> @@ -0,0 +1,300 @@
> +/*
> + * Copyright © 2019 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +#include "igt.h"
> +#include "igt_device_scan.h"
> +#include "igt_kmod.h"
> +#include "igt_sysfs.h"
> +
> +IGT_TEST_DESCRIPTION("Examine behavior of a driver on device hot unplug");
> +
> +struct hotunplug {
> +	struct {
> +		int drm;
> +		int sysfs_dev;
> +		int sysfs_bus;
> +		int sysfs_drv;
> +	} fd;
> +	char *dev_bus_addr;
> +};
> +
> +/* Helpers */
> +
> +static void prepare_for_unbind(struct hotunplug *priv, char *buf, int buflen)
> +{
> +	int len;
> +
> +	igt_assert(buflen);
> +
> +	priv->fd.sysfs_drv = openat(priv->fd.sysfs_dev, "device/driver",
> +				    O_DIRECTORY);
> +	igt_assert(priv->fd.sysfs_drv >= 0);
> +
> +	len = readlinkat(priv->fd.sysfs_dev, "device", buf, buflen - 1);
> +	buf[len] = '\0';
> +	priv->dev_bus_addr = strrchr(buf, '/');
> +	igt_assert(priv->dev_bus_addr++);
> +
> +	/* sysfs_dev no longer needed */
> +	close(priv->fd.sysfs_dev);
> +}
> +
> +static void prepare(struct hotunplug *priv, char *buf, int buflen)
> +{
> +	igt_debug("opening device\n");
> +	priv->fd.drm = __drm_open_driver(DRIVER_ANY);
> +	igt_assert(priv->fd.drm >= 0);
> +
> +	priv->fd.sysfs_dev = igt_sysfs_open(priv->fd.drm);
> +	igt_assert(priv->fd.sysfs_dev >= 0);
> +
> +	if (buf) {
> +		prepare_for_unbind(priv, buf, buflen);
> +	} else {
> +		/* prepare for bus rescan */
> +		priv->fd.sysfs_bus = openat(priv->fd.sysfs_dev,
> +					    "device/subsystem", O_DIRECTORY);
> +		igt_assert(priv->fd.sysfs_bus >= 0);
> +	}
> +}
> +
> +static const char *failure;
> +
> +/* Unbind the driver from the device */
> +static void driver_unbind(int fd_sysfs_drv, const char *dev_bus_addr)
> +{
> +	failure = "Driver unbind timeout!";
> +	igt_set_timeout(60, failure);
> +	igt_sysfs_set(fd_sysfs_drv, "unbind", dev_bus_addr);
> +	igt_reset_timeout();
> +	failure = NULL;
> +
> +	/* don't close fd_sysfs_drv, it will be used for driver rebinding */
> +}
> +
> +/* Re-bind the driver to the device */
> +static void driver_bind(int fd_sysfs_drv, const char *dev_bus_addr)
> +{
> +	failure = "Driver re-bind timeout!";
> +	igt_set_timeout(60, failure);
> +	igt_sysfs_set(fd_sysfs_drv, "bind", dev_bus_addr);
> +	igt_reset_timeout();
> +	failure = NULL;
> +
> +	close(fd_sysfs_drv);
> +}
> +
> +/* Remove (virtually unplug) the device from its bus */
> +static void device_unplug(int fd_sysfs_dev)
> +{
> +	failure = "Device unplug timeout!";
> +	igt_set_timeout(60, failure);
> +	igt_sysfs_set(fd_sysfs_dev, "device/remove", "1");
> +	igt_reset_timeout();
> +	failure = NULL;
> +
> +	close(fd_sysfs_dev);
> +}
> +
> +/* Re-discover the device by rescanning its bus */
> +static void bus_rescan(int fd_sysfs_bus)
> +{
> +	failure = "Bus rescan timeout!";
> +	igt_set_timeout(60, failure);
> +	igt_sysfs_set(fd_sysfs_bus, "rescan", "1");
> +	igt_reset_timeout();
> +	failure = NULL;
> +
> +	close(fd_sysfs_bus);
> +}
> +
> +static void healthcheck(void)
> +{
> +	int fd_drm;
> +
> +	/* device name may have changed, rebuild IGT device list */
> +	igt_devices_scan(true);
> +
> +	igt_debug("reopening the device\n");
> +	fd_drm = __drm_open_driver(DRIVER_ANY);
> +	igt_abort_on_f(fd_drm < 0, "Device reopen failure");
> +
> +	if (is_i915_device(fd_drm)) {
> +		failure = "GEM failure";
> +		igt_require_gem(fd_drm);
> +		failure = NULL;
> +	}
> +
> +	close(fd_drm);
> +}
> +
> +static void set_filter_from_device(int fd)
> +{
> +	const char *filter_type = "sys:";
> +	char filter[strlen(filter_type) + PATH_MAX + 1];
> +	char *dst = stpcpy(filter, filter_type);
> +	char path[PATH_MAX + 1];
> +
> +	igt_assert(igt_sysfs_path(fd, path, PATH_MAX));
> +	strncat(path, "/device", PATH_MAX - strlen(path));
> +	igt_assert(realpath(path, dst));
> +
> +	igt_device_filter_set(filter);
> +}
> +
> +/* Subtests */
> +
> +static void unbind_rebind(void)
> +{
> +	struct hotunplug priv;
> +	char buf[PATH_MAX];
> +
> +	prepare(&priv, buf, sizeof(buf));
> +
> +	igt_debug("closing the device\n");
> +	close(priv.fd.drm);
> +
> +	igt_debug("unbinding the driver from the device\n");
> +	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> +	igt_debug("rebinding the driver to the device\n");
> +	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> +	healthcheck();
> +}
> +
> +static void unplug_rescan(void)
> +{
> +	struct hotunplug priv;
> +
> +	prepare(&priv, NULL, 0);
> +
> +	igt_debug("closing the device\n");
> +	close(priv.fd.drm);
> +
> +	igt_debug("unplugging the device\n");
> +	device_unplug(priv.fd.sysfs_dev);
> +
> +	igt_debug("recovering the device\n");
> +	bus_rescan(priv.fd.sysfs_bus);
> +
> +	healthcheck();
> +}
> +
> +static void hotunbind_lateclose(void)
> +{
> +	struct hotunplug priv;
> +	char buf[PATH_MAX];
> +
> +	prepare(&priv, buf, sizeof(buf));
> +
> +	igt_debug("hot unbinding the driver from the device\n");
> +	driver_unbind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> +	igt_debug("rebinding the driver to the device\n");
> +	driver_bind(priv.fd.sysfs_drv, priv.dev_bus_addr);
> +
> +	igt_debug("late closing the unbound device instance\n");
> +	close(priv.fd.drm);
> +
> +	healthcheck();
> +}
> +
> +static void hotunplug_lateclose(void)
> +{
> +	struct hotunplug priv;
> +
> +	prepare(&priv, NULL, 0);
> +
> +	igt_debug("hot unplugging the device\n");
> +	device_unplug(priv.fd.sysfs_dev);
> +
> +	igt_debug("recovering the device\n");
> +	bus_rescan(priv.fd.sysfs_bus);
> +
> +	igt_debug("late closing the removed device instance\n");
> +	close(priv.fd.drm);
> +
> +	healthcheck();
> +}
> +
> +/* Main */
> +
> +igt_main
> +{
> +	igt_fixture {
> +		int fd_drm;
> +
> +		/**
> +		 * As subtests must be able to close examined devices
> +		 * completely, don't use drm_open_driver() as it keeps
> +		 * a device file descriptor open for exit handler use.
> +		 */
> +		fd_drm = __drm_open_driver(DRIVER_ANY);
> +		igt_assert(fd_drm >= 0);
> +
> +		if (is_i915_device(fd_drm))
> +			igt_require_gem(fd_drm);
> +
> +		/* Make sure subtests always reopen the same device */
> +		set_filter_from_device(fd_drm);
> +
> +		close(fd_drm);
> +	}
> +
> +	igt_describe("Check if the driver can be cleanly unbound from a device believed to be closed");
> +	igt_subtest("unbind-rebind")
> +		unbind_rebind();
> +
> +	igt_fixture
> +		igt_abort_on_f(failure, "%s\n", failure);
> +
> +	igt_describe("Check if a device believed to be closed can be cleanly unplugged");
> +	igt_subtest("unplug-rescan")
> +		unplug_rescan();
> +
> +	igt_fixture
> +		igt_abort_on_f(failure, "%s\n", failure);
> +
> +	igt_describe("Check if the driver can be cleanly unbound from a still open device, then released");
> +	igt_subtest("hotunbind-lateclose")
> +		hotunbind_lateclose();
> +
> +	igt_fixture
> +		igt_abort_on_f(failure, "%s\n", failure);
> +
> +	igt_describe("Check if a still open device can be cleanly unplugged, then released");
> +	igt_subtest("hotunplug-lateclose")
> +		hotunplug_lateclose();
> +
> +	igt_fixture
> +		igt_abort_on_f(failure, "%s\n", failure);
> +}
> diff --git a/tests/intel-ci/blacklist.txt b/tests/intel-ci/blacklist.txt
> index ee7045f03..201f4b1b4 100644
> --- a/tests/intel-ci/blacklist.txt
> +++ b/tests/intel-ci/blacklist.txt
> @@ -117,3 +117,4 @@ igt at .*@.*pipe-f($|-.*)
>  # Since 5.7-rc1, this test has produced tens of megabytes of kernel
>  # logs.
>  igt at perf_pmu@cpu-hotplug
> +igt at core_hotunplug@.*


That makes it read like it's also "producing tens of megabytes of
kernel logs". An empty line, and a separate comment with something
like

# Currently fails and leaves the machine in a very bad state, and
# causes coverage loss for other tests.


With that,
Acked-by: Petri Latvala <petri.latvala at intel.com>


More information about the Intel-gfx mailing list