[igt-dev] [PATCH v5 1/2] tests: Add a new test for driver/device hot reload
Janusz Krzysztofik
janusz.krzysztofik at linux.intel.com
Wed Apr 10 09:03:58 UTC 2019
On Tuesday, April 9, 2019 4:50:49 PM CEST Katarzyna Dec wrote:
> On Tue, Apr 09, 2019 at 01:10:58PM +0200, Janusz Krzysztofik wrote:
> > From: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> >
> > Run a dummy load in background to put some workload on a device, then try
> > to either remove (unplug) the device from its bus, or unbind the device's
> > driver from it, depending on which subtest has been selected. If
> > succeeded, unload the driver, rescan the device's bus if needed and
> > perform health checks on the device with the driver reloaded.
> >
> > The dummy load is run from igt_fixture and in a sub-process, not directly
> > from subtests, as it is expected to fail and it's more simple to ignore
> > igt_abort() in a sub-process. Moreover, as soon as the sub-process fails
> > and exits, resources it was using are freed automatically so there is no
> > need to do any cleanups required for smooth module unload from the test
> > level itself. Those cleanups might also make the subtests fail if simply
> > using igt library functions for that instead of reimplementing their safe
> > parts only.
> >
> > The driver hot unbind / device hot unplug operation is expected to succeed
> > and the background workload sub-process to die in a reasonable time,
> > however long timeouts are used to let kernel level timeouts pop up first
> > if hit by a bug.
> >
> > The dummy load works only on i915 driver. The test is skipped on other
> > hardware unless they provide their implementation of igt_spin_batch_new()
> > and friends.
> >
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> > ---
> >
> > tests/Makefile.sources | 1 +
> > tests/core_hot_reload.c | 256 ++++++++++++++++++++++++++++++++++++++++
> > tests/meson.build | 1 +
> > 3 files changed, 258 insertions(+)
> > create mode 100644 tests/core_hot_reload.c
> >
> > diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> > index 214698da..d2c0941d 100644
> > --- a/tests/Makefile.sources
> > +++ b/tests/Makefile.sources
> > @@ -16,6 +16,7 @@ TESTS_progs = \
> >
> > core_getclient \
> > core_getstats \
> > core_getversion \
> >
> > + core_hot_reload \
> >
> > core_setmaster_vs_auth \
> > debugfs_test \
> > drm_import_export \
> >
> > diff --git a/tests/core_hot_reload.c b/tests/core_hot_reload.c
> > new file mode 100644
> > index 00000000..d862c99c
> > --- /dev/null
> > +++ b/tests/core_hot_reload.c
> > @@ -0,0 +1,256 @@
> > +/*
> > + * Copyright © 2019 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining
> > a
> > + * copy of this software and associated documentation files (the
> > "Software"), + * to deal in the Software without restriction, including
> > without limitation + * the rights to use, copy, modify, merge, publish,
> > distribute, sublicense, + * and/or sell copies of the Software, and to
> > permit persons to whom the + * Software is furnished to do so, subject to
> > the following conditions: + *
> > + * The above copyright notice and this permission notice (including the
> > next + * paragraph) shall be included in all copies or substantial
> > portions of the + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND
> > NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS
> > BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN
> > ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN
> > CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE
> > SOFTWARE.
> > + */
> > +
> > +#include "igt.h"
> > +#include "igt_device.h"
> > +#include "igt_dummyload.h"
> > +#include "igt_kmod.h"
> > +#include "igt_sysfs.h"
> > +
> > +#include <getopt.h>
> > +#include <limits.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +
> > +
> > +typedef int (*action_t)(int dir);
> > +typedef void (*workload_wait_t)(void *priv);
> > +typedef void (*workload_t)(int device, const void *priv);
> > +
> > +
> > +/*
> > + * Actions
> > + *
> > + * Purpose: make the device disappear
> > + *
> > + * @dir: file descriptor of an open device sysfs directory
> > + *
> > + * Return value: file descriptor of an open device bus' sysfs directory
> > + * or -1 if no bus rescan is needed
> > + */
The above comment provides information not on a particular function but rather
on a patter that should be followed by functions performing particular test
actions. Definition of two such functions follow. Their callers expect an fd
of a sysfs directory corresponding to a bus to be rescanned as a return value,
or -1 if no bus rescan is needed.
> > +
> > +/* Unbind the driver from the device */
>
> I am glad that more documentation had come. I would only prefer to have it
> somehow consistent for functions in whole binary - please have either:
> /**
> * name:
> * variables
> *
> * Return values
> */
> or simple comments /* */
> But this is just a tiny thing :)
I hope the explanation above, though more oriented on clarification of that -1
case, also clarifies your concern on comments.
> > +static int driver_unbind(int dir)
> > +{
> > + char path[PATH_MAX], *dev_bus_addr;
> > + int len;
> > +
> > + len = readlinkat(dir, "device", path, sizeof(path) - 1);
> > + path[len] = '\0';
> > + dev_bus_addr = strrchr(path, '/') + 1;
> > +
> > + igt_set_timeout(60, "Driver unbind timeout!");
> > + igt_sysfs_set(dir, "device/driver/unbind", dev_bus_addr);
> > +
> > + /* No need for bus rescan */
> > + return -1;
>
> In doc above you wrote that this functions returns -1 or fd of the device.
> It is possible that I am wrong, but I do not see fd returned (at least not
> explicitly).
> ^^^^ So this was my comment before I read carefully later on. So it looks
> like doc is wrong, and you return -1 and nothing else. Why do we need to
> return a const here? Maybe we can return value from igt_sysfs_set?
This particular test action function above always returns -1 because no bus
rescan is needed. Its caller will know what -1 means.
Please note that next test action function below returns a bus fd becase in
that case bus rescan is required.
>
> > +}
> > +
> > +/* Remove (virtually unplug) the device from its bus */
> > +static int device_unplug(int dir)
> > +{
> > + int bus;
> > +
> > + bus = openat(dir, "device/subsystem", O_DIRECTORY);
> > + igt_assert(bus >= 0);
> > +
> > + igt_set_timeout(60, "Device unplug timeout!");
> > + igt_sysfs_set(dir, "device/remove", "1");
> > +
> > + return bus;
> > +}
> > +
No more test action functions, only those two above, one requiring bus rescan,
the other - not. :-)
> > +
> > +/*
> > + * Workloads
> > + *
> > + * Purpose: Put some long lasting load on the device
> > + *
> > + * @device: open device file descriptor,
> > + * @priv: pointer to an optional argument passed to the workload
> > + *
> > + * Return value: none
> > + */
Like in case of test action functions, the comment above describes a pattern
that should be followed by functions which apply workload. Only one such
function, the one below, is implemented.
> > +
> > +/* Workload using igt_spin_batch_run() */
> > +
> > +static void spin_batch(int device, const void *priv)
> > +{
> > + igt_spin_t *spin;
> > +
> > + /* submit the job */
> > + spin = igt_spin_batch_new(device);
> > +
> > + /* wait for the job to crash */
> > + gem_sync(device, spin->handle);
> > +
> > + /* clean up if still possible */
> > + igt_spin_batch_free(device, spin);
> > +}
> > +
> > +
> > +/*
> > + * Skeleton
> > + */
> > +
> > +static void healthcheck(int chipset)
> > +{
> > + if (chipset == DRIVER_INTEL) {
> > + /*
> > + * We have it perfectly implemented in
i915_module_load,
> > + * just use it.
> > + */
> > + igt_assert(igt_system_quiet("i915_module_load --run-
subtest reload")
> > + == IGT_EXIT_SUCCESS);
> > + } else {
> > + /*
> > + * We don't know how to check an unidentified device
for health,
> > + * device reopen must suffice.
> > + */
> > + }
>
> Just a question - maybe we can add something like
> 'igt_skip_on(!DRIVER_INTEL)? with same comment as in else statement?
> It is only another approach :)
I think non-Intel people would better decide if they want the test skipped or
still run with no exhaustive health checking.
> > +}
> > +
> > +static void driver_unload(int chipset, char *driver)
> > +{
> > + if (chipset == DRIVER_INTEL)
> > + igt_assert(igt_i915_driver_unload() ==
IGT_EXIT_SUCCESS);
> > + else
> > + igt_assert(igt_kmod_unload(driver, 0) == 0);
> > +}
> > +
> > +static int operation(int device, action_t action, workload_wait_t
> > workload_wait, + void *workload_priv)
> > +{
> > + int dir, bus;
> > +
> > + dir = igt_sysfs_open(device);
> > +
> > + bus = action(dir);
> > + close(dir);
> > +
> > + if (workload_wait)
> > + workload_wait(workload_priv);
> > + igt_reset_timeout();
> > +
> > + return bus;
> > +}
> > +
> > +
> > +static void __subtest(int device, int chipset, char *driver, action_t
> > action, + workload_wait_t workload_wait, void
*workload_priv)
> > +{
> > + int bus = operation(device, action, workload_wait, workload_priv);
> > +
> > + close(device);
> > + driver_unload(chipset, driver);
> > +
> > + /* Valid file descriptor indicates we should rescan the bus */
> > + if (bus >= 0) {
> > + igt_sysfs_set(bus, "rescan", "1");
> > + close(bus);
> > + }
> > +}
And here above we have the recipient of those values returned by test action
functions. Based on such return value, it may perform rescan of a bus
represented by that value.
> > +
> > +static void run_subtest(int *device, int chipset, char *driver, action_t
> > action, + workload_wait_t workload_wait,
void *workload_priv)
> > +{
> > + __subtest(*device, chipset, driver, action, workload_wait,
> > workload_priv); +
> > + healthcheck(chipset);
> > +
> > + *device = __drm_open_driver(chipset);
> > + igt_assert(*device >= 0);
> > +}
> > +
> > +static void __workload(workload_t workload, int device, const void *priv,
> > + struct igt_helper_process *proc)
> > +{
> > + igt_fork_helper(proc)
> > + workload(device, priv);
> > + /* let the background process start doing its job */
> > + sleep(2);
> > +}
> > +
> > +static void __workload_wait(void *priv)
> > +{
> > + struct igt_helper_process *proc = priv;
> > +
> > + /* wait until the workload has crashed */
> > + igt_wait_helper(proc);
> > +}
> > +
> > +
> > +igt_main {
> > + int device, chipset;
> > + char *driver;
> > + struct igt_helper_process proc = {};
> > + workload_wait_t workload_wait;
> > + void *workload_priv;
> > +
> > + igt_fixture {
> > + char path[PATH_MAX];
> > + int dir, len;
> > +
> > + /*
> > + * Since the test depends on successful unload of
driver module,
> > + * don't use drm_open_driver() as it keeps a file
descriptor
> > + * open for exit handler use that effectively locks the
module.
> > + */
> > + device = __drm_open_driver(DRIVER_ANY);
> > + igt_assert(device >= 0);
> > +
> > + if (is_i915_device(device)) {
> > + chipset = DRIVER_INTEL;
> > + driver = strdup("i915");
> > + } else {
> > + chipset = DRIVER_ANY;
> > +
> > + /* Capture module name to be unloaded */
> > + dir = igt_sysfs_open(device);
> > + len = readlinkat(dir, "device/driver/
module", path,
> > + sizeof(path) - 1);
> > + close(dir);
> > + path[len] = '\0';
> > + driver = strdup(strrchr(path, '/') + 1);
> > + }
> > + igt_info("Running the test on driver \"%s\", chipset
mask %#0x\n",
> > + driver, chipset);
> > +
> > + workload_wait = __workload_wait;
> > + workload_priv = &proc;
> > + __workload(spin_batch, device, NULL, &proc);
> > + }
> > +
> > + igt_subtest("unplug")
> > + run_subtest(&device, chipset, driver, device_unplug,
> > + workload_wait, workload_priv);
> > +
> > + igt_subtest("unbind")
> > + run_subtest(&device, chipset, driver, driver_unbind,
> > + workload_wait, workload_priv);
> > +
> > + igt_fixture {
> > + free(driver);
> > + close(device);
> > + }
> > +}
>
> Generally everything looks ok. Few fixes and this can be it.
> I really like you approach with test building blocks.
I'm glad to hear that, and thanks for your review.
Janusz
>
> Kasia
>
> > diff --git a/tests/meson.build b/tests/meson.build
> > index 5167a6cc..1b91e5a2 100644
> > --- a/tests/meson.build
> > +++ b/tests/meson.build
> > @@ -3,6 +3,7 @@ test_progs = [
> >
> > 'core_getclient',
> > 'core_getstats',
> > 'core_getversion',
> >
> > + 'core_hot_reload',
> >
> > 'core_setmaster_vs_auth',
> > 'debugfs_test',
> > 'drm_import_export',
More information about the igt-dev
mailing list