[igt-dev] [PATCH] tests: Add a new test for driver/device hot remove

Mon Apr 1 10:16:02 UTC 2019

On Mon, Apr 01, 2019 at 08:55:37AM +0000, Krzysztofik, Janusz wrote:
> On Monday, April 1, 2019 9:22:34 AM CEST Daniel Vetter wrote:
> > On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> > > On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > > > Run a dummy load in background to put some workload on a device, then
> > > > try
> > > > to either remove (unplug) the device from its bus, or unbind the
> > > > device's
> > > > driver from it, depending on which subtest has been selected.
> > > > 
> > > > The driver hot unbind / device hot unplug operation is expected to
> > > > succeed
> > > > in a reasonable time, however long timeouts are used to allow kernel
> > > > level timeouts pop up first if any.
> > > > 
> > > > Please note that if running both subtests consecutively, the second one
> > > > is always skipped as the device is no longer available as soon as the
> > > > first subtest is completed.  The most reliable way to run another
> > > > subtest
> > > > is to reboot the system first, then select next subtest to be run.
> > > 
> > > This is a requirement that won't fly for CI use. Is the
> > > rebinding/whatever of the device not possible to do? By the test?
> > > 
> > > Does it also apply to running other test binaries after running the
> > > first subtest? As in, is it just a timing issue?
> > 
> > Yeah like the module reload testcase this testcase needs to restore the
> > driver to working state afterwards. I think there's a corresponding rebind
> > sysfs file too.
> 
> I see your point, however I don't see how that could be done.
> 
> I think we can't say that the module reload test restores the driver to 
> working state.  It only TRIES to to that, and that's the merit of that test to 
> check if module reload succeeds or not.  I think there is no way to implement 
> "restore the driver to working state" that would work under all circumstances, 
> even if there is something wrong (a bug?) that causes it failing.  In other 
> words, I think a user should never assume the driver is in working state after 
> the i915_module_load test is run as that operation may simply fail.  The user 
> should look at the test result to learn about the driver state.
> 

The best scenario is that driver restore is attempted. If it doesn't,
we blacklist this test in CI, or don't merge it. If that works, all is
fine, but we still take the same care with this test as with module
reload tests.

Even the current state of business is that we don't run the module
reloading tests with the sharded runs (Fi.CI.IGT), only as part of BAT
(Fi.CI.BAT), as the last tests to execute. Shards run some of the
selftests, but as a separate shard.

Btw the selftests do _not_ require the machine to be rebooted after
them, or between them. When they succeed, all is fine. FSVO fine. We
run all selftests and then reboot implicitly as they are the last
things to run on a particular kernel. Sometimes we don't. The
requirement for reboot after any selftests that you mentioned in
another mail is due to the particular environment they are run in,
which doesn't cope well with reloading the kernel module.

> How the unplug/unbind test should proceed if it occurs there is something 
> wrong after module reload?  FAIL? SKIP? Or still SUCCESS, assuming unplug 
> itself succeeds? How the test should pass the information on module reload 
> results to a user? If we start asserting results of module reload to ensure 
> the driver is in working state, that will be a different test, not the 
> intended one, I believe.

igt_warn() I'd say.

-- 
Petri Latvala