[igt-dev] [PATCH] tests: Add a new test for driver/device hot reload

Katarzyna Dec katarzyna.dec at intel.com
Thu Apr 4 13:56:44 UTC 2019


On Wed, Apr 03, 2019 at 06:01:22PM +0200, Janusz Krzysztofik wrote:

Sorry for joining the discussion in v2 of this patch :)
> From: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> 
> Run a dummy load in background to put some workload on a device, then try
> to either remove (unplug) the device from its bus, or unbind the device's
> driver from it, depending on which subtest has been selected.  If
> succeeded, unload the driver, rescan the device's bus if needed and
> perform health checks on the device with the driver reloaded.
> 
> The driver hot unbind / device hot unplug operation is expected to succeed
> in a reasonable time, however long timeouts are used to let kernel level
> timeouts pop up first if hit by a bug.
> 
> The dummy load works only with i915 driver.  The test is skipped on other
> hardware unless they provide their implementation of igt_spin_batch_new()
> and friends.
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> ---
> Changelog:
> v1->v2:
> - renamed to core_hot_reload,
> - extended with module unload, device rediscover (unplug case), module reload 
>   and healthcheck operations following the initial unplug/unbind, as requested
>   by Daniel and Peter,
> - fail if anything goes wrong, as requested by Daniel
> - refactored a lot to reduce code redundancy.
> 
> Resending now with the igt-dev list address added to Cc:, sorry
nit: I do not know if this is a strict rule or just everyone does that, but usually
we put what has changed as a part of commit msg above any Sign-off-by or CC.
And as soon as this is a v2 you could mark it in the patch name (during
generating patch for review).
> 
>  tests/Makefile.sources  |   1 +
>  tests/core_hot_reload.c | 248 ++++++++++++++++++++++++++++++++++++++++
>  tests/meson.build       |   1 +
>  3 files changed, 250 insertions(+)
>  create mode 100644 tests/core_hot_reload.c
> 
> diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> index 86ad301e..5f9dba30 100644
> --- a/tests/Makefile.sources
> +++ b/tests/Makefile.sources
> @@ -16,6 +16,7 @@ TESTS_progs = \
>  	core_getclient \
>  	core_getstats \
>  	core_getversion \
> +	core_hot_reload \
>  	core_setmaster_vs_auth \
>  	debugfs_test \
>  	drm_import_export \
> diff --git a/tests/core_hot_reload.c b/tests/core_hot_reload.c
> new file mode 100644
> index 00000000..96f0efd0
> --- /dev/null
> +++ b/tests/core_hot_reload.c
> @@ -0,0 +1,248 @@
> +/*
> + * Copyright © 2019 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "igt.h"
> +#include "igt_device.h"
> +#include "igt_dummyload.h"
> +#include "igt_kmod.h"
> +#include "igt_sysfs.h"
> +
> +#include <getopt.h>
> +#include <limits.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +
> +typedef void (*workload_wait_t)(int device, void *priv);
> +typedef int (*action_t)(int dir);
> +typedef int (*workload_t)(int device, void *priv, action_t action);
> +
> +
> +/*
> + * Actions
> + */
> +
> +/* Unbind the driver from the device */
It would be nice if you add docs to all/almost all functions like:
/**
 *driver_unbind:
 *@dir: fd of directory to look in
 *
 * Unbind the driver from the device
 * Returns -1 for ?
 */
 Example is not finished. See other files for examples.
> +static int driver_unbind(int dir)
> +{
> +	char path[PATH_MAX], *dev_bus_addr;
> +	int len;
> +
> +	len = readlinkat(dir, "device", path, sizeof(path) - 1);
> +	path[len] = '\0';
> +	dev_bus_addr = strrchr(path, '/') + 1;
> +
> +	igt_set_timeout(60, "Driver unbind timeout!");
> +	igt_sysfs_set(dir, "device/driver/unbind", dev_bus_addr);
> +
> +	/* No need for bus rescan */
> +	return -1;
Why do we always want to return -1?
> +}
> +
> +/* Returns open file descriptor to a directory of a bus to rescan for re-plug */
s/open/opened/
> +static int device_unplug(int dir)
> +{
> +	int bus;
> +
> +	bus = openat(dir, "device/subsystem", O_DIRECTORY);
> +	igt_assert(bus >= 0);
> +
> +	igt_set_timeout(60, "Device unplug timeout!");
> +	igt_sysfs_set(dir, "device/remove", "1");
> +
> +	return bus;
> +}
> +
> +
> +/*
> + * Operation template
> + */
> +static int operation(int device, action_t action, workload_wait_t workload_wait,
> +		     void *priv)
> +{
> +	int dir, bus;
> +
> +	dir = igt_sysfs_open(device);
> +
> +	bus = action(dir);
> +	close(dir);
> +
> +	if (workload_wait)
> +		workload_wait(device, priv);
> +	igt_reset_timeout();
> +
> +	return bus;
> +}
> +
> +
> +/*
> + * Workloads
> + */
> +
> +/* Workload using igt_spin_batch_run() */
> +
> +static void spin_batch_wait(int device, void *priv)
> +{
> +	igt_spin_t *spin = priv;
> +
> +	/* wait until the workload has crashed */
> +	gem_wait(device, spin->handle, NULL);
> +}
> +
> +static int spin_batch(int device, void *priv, action_t action)
> +{
> +	igt_spin_t *spin;
> +	int bus;
> +
> +	igt_assert(!priv);
> +
> +	/* Start the workload in background.
> +	 * Make sure it is running and times out after 90 seconds */
Plase use drm/scripts/checkpatch.pl to check formating of this ^ commend and the
rest of the file.
> +	spin = igt_spin_batch_new(device, .flags = IGT_SPIN_POLL_RUN);
> +	igt_spin_busywait_until_running(spin);
> +	igt_spin_batch_set_timeout(spin, 90000000000);
> +
> +	/* run the tested action */
> +	bus = operation(device, action, spin_batch_wait, spin);
> +
> +	/*
> +	 * Clean up driver resources possibly not released on workload crash
> +	 * so the driver module can be successfully unloaded
> +	 */
> +	igt_spin_batch_free(device, spin);
> +
> +	return bus;
> +}
> +
> +
> +/*
> + * Skeleton
> + */
> +
> +static void healthcheck(int chipset, int *device)
> +{
> +	if (chipset == DRIVER_INTEL) {
> +		/*
> +		 * We have it perfectly implemented in i915_module_load,
> +		 * just use it.
> +		 */
> +		igt_assert(igt_system_quiet("i915_module_load --run-subtest reload")
> +			   == IGT_EXIT_SUCCESS);
> +	} else {
> +		/*
> +		 * We don't know how to check an unidentified device for health,
> +		 * just reopen it.
> +		 */
> +	}
If we do not know/do not want to know about other devices, we could add
requirement to whole binary:
fd = drm_open_driver(DRIVER_INTEL);
igt_require_gem(fd);
> +}
> +
> +static void driver_unload(int chipset, char *driver)
> +{
> +	if (chipset == DRIVER_INTEL)
> +		igt_assert(igt_i915_driver_unload() == IGT_EXIT_SUCCESS);
> +	else
> +		igt_assert(igt_kmod_unload(driver, 0) == 0);
> +}
> +
> +static void __subtest(int device, int chipset, char *driver,
> +		      workload_t workload, void *priv, action_t action)
> +{
> +	int bus = workload(device, priv, action);
> +
> +	close(device);
> +	driver_unload(chipset, driver);
> +
> +	/* Valid file descriptor indicates we should rescan the bus */
> +	if (bus >= 0) {
> +		igt_sysfs_set(bus, "rescan", "1");
> +		close(bus);
> +	}
> +}
> +
> +static void run_subtest(int *device, int chipset, char *driver,
> +			workload_t workload, void *priv, action_t action)
> +{
> +	__subtest(*device, chipset, driver, workload, priv, action);
> +
> +	sleep(2);
Why do we need 2 seconds of sleep here? Where is the data coming from?
> +
> +	*device = __drm_open_driver(chipset);
> +	healthcheck(chipset, device);
> +
> +	igt_assert(*device >= 0);
> +}
> +
> +
> +igt_main {
> +	int device, chipset;
> +	char *driver;
> +
> +	igt_fixture {
> +		char path[PATH_MAX];
> +		int dir, len;
> +
> +		/*
> +		 * Since the test depends on successful unload of driver module,
> +		 * don't use drm_open_driver() as it keeps a file descriptor
> +		 * open for exit handler use that effectively locks the module.
> +		 */
> +		device = __drm_open_driver(DRIVER_ANY);
Based on what you've wrote in commit msg when dummy load works only on
DRIVER_INTEL, why do we even care about other drivers? Why don't we open INTEL
here and skip on other types?

Quite a nice job done here :) Looks like you are going into right direction.
Kasia :)
> +		igt_assert(device >= 0);
> +
> +		if (is_i915_device(device)) {
> +			chipset = DRIVER_INTEL;
> +			driver = strdup("i915");
> +		} else {
> +			chipset = DRIVER_ANY;
> +
> +			/* Capture module name to be unloaded */
> +			dir = igt_sysfs_open(device);
> +			len = readlinkat(dir, "device/driver/module", path,
> +					 sizeof(path) - 1);
> +			close(dir);
> +			path[len] = '\0';
> +			driver = strdup(strrchr(path, '/') + 1);
> +		}
> +		igt_info("Running the test on driver \"%s\", chipset mask %#0x\n",
> +			 driver, chipset);
> +	}
> +
> +	igt_subtest("unplug")
> +		run_subtest(&device, chipset, driver, spin_batch, NULL,
> +			    device_unplug);
> +
> +	igt_subtest("unbind")
> +		run_subtest(&device, chipset, driver, spin_batch, NULL,
> +			    driver_unbind);
> +
> +	igt_fixture {
> +		free(driver);
> +		close(device);
> +		/*
> +		 * Now reopen the device with drm_open_driver() for a while
> +		 * so exit handler is registered and can do its job on exit.
> +		 */
> +		device = drm_open_driver(chipset);
> +		close(device);
> +	}
> +}
> diff --git a/tests/meson.build b/tests/meson.build
> index 8e1ae4fd..e6b87447 100644
> --- a/tests/meson.build
> +++ b/tests/meson.build
> @@ -3,6 +3,7 @@ test_progs = [
>  	'core_getclient',
>  	'core_getstats',
>  	'core_getversion',
> +	'core_hot_reload',
>  	'core_setmaster_vs_auth',
>  	'debugfs_test',
>  	'drm_import_export',
> -- 
> 2.20.1
> 


More information about the igt-dev mailing list