[igt-dev] [PATCH] tests: Add a new test for driver/device hot remove

Krzysztofik, Janusz janusz.krzysztofik at intel.com
Mon Apr 1 08:55:37 UTC 2019


On Monday, April 1, 2019 9:22:34 AM CEST Daniel Vetter wrote:
> On Fri, Mar 29, 2019 at 12:47:32PM +0200, Petri Latvala wrote:
> > On Thu, Mar 28, 2019 at 05:47:19PM +0100, Janusz Krzysztofik wrote:
> > > Run a dummy load in background to put some workload on a device, then
> > > try
> > > to either remove (unplug) the device from its bus, or unbind the
> > > device's
> > > driver from it, depending on which subtest has been selected.
> > > 
> > > The driver hot unbind / device hot unplug operation is expected to
> > > succeed
> > > in a reasonable time, however long timeouts are used to allow kernel
> > > level timeouts pop up first if any.
> > > 
> > > Please note that if running both subtests consecutively, the second one
> > > is always skipped as the device is no longer available as soon as the
> > > first subtest is completed.  The most reliable way to run another
> > > subtest
> > > is to reboot the system first, then select next subtest to be run.
> > 
> > This is a requirement that won't fly for CI use. Is the
> > rebinding/whatever of the device not possible to do? By the test?
> > 
> > Does it also apply to running other test binaries after running the
> > first subtest? As in, is it just a timing issue?
> 
> Yeah like the module reload testcase this testcase needs to restore the
> driver to working state afterwards. I think there's a corresponding rebind
> sysfs file too.

I see your point, however I don't see how that could be done.

I think we can't say that the module reload test restores the driver to 
working state.  It only TRIES to to that, and that's the merit of that test to 
check if module reload succeeds or not.  I think there is no way to implement 
"restore the driver to working state" that would work under all circumstances, 
even if there is something wrong (a bug?) that causes it failing.  In other 
words, I think a user should never assume the driver is in working state after 
the i915_module_load test is run as that operation may simply fail.  The user 
should look at the test result to learn about the driver state.

How the unplug/unbind test should proceed if it occurs there is something 
wrong after module reload?  FAIL? SKIP? Or still SUCCESS, assuming unplug 
itself succeeds? How the test should pass the information on module reload 
results to a user? If we start asserting results of module reload to ensure 
the driver is in working state, that will be a different test, not the 
intended one, I believe.

The best scenario for CI I can see is to schedule i915_module_load just 
following drm_hot_remove.

Am I missing something?

> btw since you started with this, bunch more subtests we need to validate
> this:
> - unbind while an ioctl is called. Especially fun with all the ones that
>   can block. We need a subtest for each ioctl to make sure stuff works
>   correctly. At least, complex ioctl probably need more than that.
> 
> - unplug while atomic commit pending
> 
> - unplug with dma-buf exported, with various external users doing stuff to
>   that dma-buf
> 
> - unplug with dma-fence exported (either as syncpt, drm_syncobj or
>   implicitly attached to a dma-buf), this one probably needs lots of
>   a few variants to cover everything.
> 
> There's probably a lot more (you should discover endless amounts of oops,
> this area is full of bugs), but this should at least get you started.

Many thanks for this list. As an i915 beginner, I was really lost while trying 
to define workloads which are worth of testing.  I'll probably ask some more 
question later if I'm in a position to continue this effort (see below ;-) ).

> And you probably want an entire team of engineers fixing the kernel bugs
> you uncover :-/

:-) I think that's still worth of effort, anyway ;-)

Thanks,
Janusz

> 
> Cheers, Daniel
> 
> > > +	igt_subtest("unplug") {
> > > +		igt_spin_t *spin;
> > > +
> > > +		/* Verify if a suitable DRM device/driver exists */
> > > +		igt_skip_on(device < 0);
> > > +
> > > +		/* Run background workload */
> > > +		spin = igt_spin_batch_new(device, .flags = 
IGT_SPIN_POLL_RUN);
> > 
> > igt_spin_batch_new requires DRIVER_INTEL, doesn't it?
> > 
> > > +		igt_spin_busywait_until_running(spin);
> > > +
> > > +		/* Make the device disappear */
> > > +		igt_set_timeout(60, "Device unplug timeout!");
> > > +		device_unplug(device);
> > > +		igt_reset_timeout();
> > > +
> > > +		close(device);
> > > +		device = -1;
> > > +	}
> > > +
> > > +	igt_subtest("unbind") {
> > > +		igt_spin_t *spin;
> > > +
> > > +		/* Verify if a suitable DRM device/driver exists */
> > > +		igt_skip_on(device < 0);
> > 
> > Ah, I see. The skip if running consecutively is because the previous
> > subtest closed the fd.
> > 
> > Another fixture before this subtest, get the device plugged back,
> > reopen driver?

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.



More information about the igt-dev mailing list