[igt-dev] [PATCH v7 1/2] tests: Add a new test for driver/device hot reload

Katarzyna Dec katarzyna.dec at intel.com
Wed Apr 17 08:53:05 UTC 2019


On Tue, Apr 16, 2019 at 09:17:21AM +0200, Janusz Krzysztofik wrote:
> Hi Kasia,
> 
> On Friday, April 12, 2019 10:16:41 AM CEST Katarzyna Dec wrote:
> > On Thu, Apr 11, 2019 at 02:26:28PM +0200, Janusz Krzysztofik wrote:
> > > From: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> > > 
> > > Run a dummy load in background to put some workload on a device, then try
> > > to either remove (unplug) the device from its bus, or unbind the device's
> > > driver from it, depending on which subtest has been selected.  If
> > > succeeded, unload the driver, rescan the device's bus if needed and
> > > perform health checks on the device with the driver reloaded.
> > > 
> > > The dummy load is run from igt_fixture and in a sub-process, not directly
> > > from subtests,  as it is expected to fail and it's more simple to ignore
> > > igt_abort() in a sub-process.  Moreover, as soon as the sub-process fails
> > > and exits, resources it was using are freed automatically so there is no
> > > need to do any cleanups required for smooth module unload from the test
> > > level itself.  Those cleanups might also make the subtests fail if simply
> > > 
> > > using igt library functions for that instead of reimplementing their safe
> > > parts only.
> > > 
> > > The driver hot unbind / device hot unplug operation is expected to succeed
> > > and the background workload sub-process to die in a reasonable time,
> > > however long timeouts are used to let kernel level timeouts pop up first
> > > if hit by a bug.
> > > 
> > > The dummy load works only on i915 driver.  The test is skipped on other
> > > hardware unless they provide their implementation of igt_spin_batch_new()
> > > and friends.
> > 
> > There are few not needed spaces above, no need to send new version if this
> > will be the only thing (can be fixed during merging).
> 
> OK.
> 
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at intel.com>
> > > ---
> > > 
> > >  tests/Makefile.sources  |   1 +
> > >  tests/core_hot_reload.c | 256 ++++++++++++++++++++++++++++++++++++++++
> > >  tests/meson.build       |   1 +
> > >  3 files changed, 258 insertions(+)
> > >  create mode 100644 tests/core_hot_reload.c
> > > 
> > > diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> > > index 214698da..d2c0941d 100644
> > > --- a/tests/Makefile.sources
> > > +++ b/tests/Makefile.sources
> > > @@ -16,6 +16,7 @@ TESTS_progs = \
> > > 
> > >  	core_getclient \
> > >  	core_getstats \
> > >  	core_getversion \
> > > 
> > > +	core_hot_reload \
> > > 
> > >  	core_setmaster_vs_auth \
> > >  	debugfs_test \
> > >  	drm_import_export \
> > > 
> > > diff --git a/tests/core_hot_reload.c b/tests/core_hot_reload.c
> > > new file mode 100644
> > > index 00000000..d862c99c
> > > --- /dev/null
> > > +++ b/tests/core_hot_reload.c
> > > @@ -0,0 +1,256 @@
> > > +/*
> > > + * Copyright © 2019 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining
> > > a
> > > + * copy of this software and associated documentation files (the
> > > "Software"), + * to deal in the Software without restriction, including
> > > without limitation + * the rights to use, copy, modify, merge, publish,
> > > distribute, sublicense, + * and/or sell copies of the Software, and to
> > > permit persons to whom the + * Software is furnished to do so, subject to
> > > the following conditions: + *
> > > + * The above copyright notice and this permission notice (including the
> > > next + * paragraph) shall be included in all copies or substantial
> > > portions of the + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > > EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND
> > > NONINFRINGEMENT.  IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS
> > > BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN
> > > ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN
> > > CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE
> > > SOFTWARE.
> > > + */
> > > +
> > > +#include "igt.h"
> > > +#include "igt_device.h"
> > > +#include "igt_dummyload.h"
> > > +#include "igt_kmod.h"
> > > +#include "igt_sysfs.h"
> > > +
> > > +#include <getopt.h>
> > > +#include <limits.h>
> > > +#include <string.h>
> > > +#include <unistd.h>
> > > +
> > > +
> > 
> > I think that 1 blank line will be fine.
> 
> OK.
> 
> > > +typedef int (*action_t)(int dir);
> > > +typedef void (*workload_wait_t)(void *priv);
> > > +typedef void (*workload_t)(int device, const void *priv);
> > > +
> > > +
> > 
> > same here :)
> 
> OK.
> 
> > > +/*
> > > + * Actions
> > > + *
> > > + * Purpose: make the device disappear
> > > + *
> > > + * @dir: file descriptor of an open device sysfs directory
> > > + *
> > > + * Return value: file descriptor of an open device bus' sysfs directory
> > > + * 		 or -1 if no bus rescan is needed
> > > + */
> > > +
> > 
> > I went though your answers on my quesions. It is more then ok to have a
> > wider overview on what is going on in whole binary, although I would prefer
> > to have some summary doc at the begining and small docs aboe functions (if
> > needed).
> 
> Let's try to agree on that before I submit another version.
> 
> There are now 3 multiline comments, each of them open one of three sections of 
> the code:
> 1) a section with different action functions,
> 2) a section with different workload functions,
> 3) a section with common code, I called it skeleton.
> 
> If I move all those three comments at the top, they will loose their purpose 
> of opening the sections.  That's why I propose to keep them where they are 
> now, and to add another comment at the top.  Here is how it may look like:
> 
>  /**
>  * This binary includes subtests for performing the following two actions:
>  * - unbinding the driver from a device under load,
>  * - virtually unplugging the device under load from its bus.
>  * Both actions are implemented as separate functions sharing the same API,
>  * wrapped with a common skeleton.  A similar approach has been taken for
>  * defining different workloads.  Even if only one type of workload is currently
>  * implemented, the code is ready for extending it with other workloads defined
>  * as additional workload functions which follow the same API.
>  * Since the unplug action actually removes the device instance from the system,
>  * a bus rescan operation is needed to reinstantiate the device after completion
>  * of the test.  For that purpose, the action passes back a file descriptor of
>  * an open sysfs directory representing the bus on which the rescan is then 
>  * performed.  In case of the unbind action, it always returns an invalid file
>  * descriptor to inform the skeleton that no bus rescan is needed.
>  * Since a workload is expected to crash as a result of an action, its
>  * invocation is performed from a helper process run from the igt_fixture block
>  * in order to avoid direct influence of that crash on a test result.
>  */
> 
> What do you think?
It looks a little too long, but it is ok :).
> 
> > > +/* Unbind the driver from the device */
> > > +static int driver_unbind(int dir)
> > > +{
> > > +	char path[PATH_MAX], *dev_bus_addr;
> > > +	int len;
> > > +
> > > +	len = readlinkat(dir, "device", path, sizeof(path) - 1);
> > > +	path[len] = '\0';
> > > +	dev_bus_addr = strrchr(path, '/') + 1;
> > > +
> > > +	igt_set_timeout(60, "Driver unbind timeout!");
> > > +	igt_sysfs_set(dir, "device/driver/unbind", dev_bus_addr);
> > > +
> > > +	/* No need for bus rescan */
> > > +	return -1;
> > > +}
> > > +
> > > +/* Remove (virtually unplug) the device from its bus */
> > > +static int device_unplug(int dir)
> > > +{
> > > +	int bus;
> > > +
> > > +	bus = openat(dir, "device/subsystem", O_DIRECTORY);
> > > +	igt_assert(bus >= 0);
> > > +
> > > +	igt_set_timeout(60, "Device unplug timeout!");
> > > +	igt_sysfs_set(dir, "device/remove", "1");
> > > +
> > > +	return bus;
> > > +}
> > > +
> > > +
> > > +/*
> > > + * Workloads
> > > + *
> > > + * Purpose: Put some long lasting load on the device
> > > + *
> > > + * @device: open device file descriptor,
> > > + * @priv: pointer to an optional argument passed to the workload
> > > + *
> > > + * Return value: none
> > > + */
> > > +
> > > +/* Workload using igt_spin_batch_run() */
> > > +
> > > +static void spin_batch(int device, const void *priv)
> > > +{
> > > +	igt_spin_t *spin;
> > > +
> > > +	/* submit the job */
> > > +	spin = igt_spin_batch_new(device);
> > > +
> > > +	/* wait for the job to crash */
> > > +	gem_sync(device, spin->handle);
> > > +
> > > +	/* clean up if still possible */
> > > +	igt_spin_batch_free(device, spin);
> > > +}
> > > +
> > > +
> > > +/*
> > > + * Skeleton
> > > + */
> > > +
> > > +static void healthcheck(int chipset)
> > > +{
> > > +	if (chipset == DRIVER_INTEL) {
> > > +		/*
> > > +		 * We have it perfectly implemented in i915_module_load,
> > > +		 * just use it.
> > > +		 */
> > > +		igt_assert(igt_system_quiet("i915_module_load --run-subtest reload")
> > > +			   == IGT_EXIT_SUCCESS);
> > > +	} else {
> > > +		/*
> > > +		 * We don't know how to check an unidentified device for health,
> > > +		 * device reopen must suffice.
> > > +		 */
> > > +	}
> > 
> > I saw you answer about this else and I agree that it is not for us to decide
> > what should code for other vendors look like. That is why we do not need
> > else here at all
> 
> OK.
> 
> Janusz
> 
> > - healthcheck will be run only for INTEL + person who runs
> > code will not see any message that 'if' here was not hit.
> > 
> > The rest looks fine.
> > Kasia :)
> 
> 
>
We have a discussion at the office together with knr about this binary. The
outcome was:
* we should not run test from inside this one, so should avoid cases when we run 
'i915_module_load' inside healthcheck, you should add test code - exactly the
functions that you need.
* It would be nice to have bind/unbind scenario with DRIVER_ANY located
somewhere in igt/tests/* (vendor agnostic)
* we have also doubts about second patch. Even if it looks cool, it cannot be
probably used by CI. And if CI is not using it, most probably developers will
not be using it too.

For more questions/clarifications or concerns, please start discussion on
intel-gfx pointing people that you need :) It is the fastest way to have
feedback.

Kasia :)
> 


More information about the igt-dev mailing list