[igt-dev] [PATCH i-g-t] lib/igt_kmod: Allow some leeway in igt_kmod_unload_r

Cavitt, Jonathan jonathan.cavitt at intel.com
Tue Jan 17 17:56:24 UTC 2023


-----Original Message-----
From: Kamil Konieczny <kamil.konieczny at linux.intel.com> 
Sent: Tuesday, January 17, 2023 8:07 AM
To: igt-dev at lists.freedesktop.org
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Summers, Stuart <stuart.summers at intel.com>; Parupalli, Sandeep Kumar <sandeep.kumar.parupalli at intel.com>; Chris Wilson <chris.p.wilson at linux.intel.com>
Subject: Re: [igt-dev] [PATCH i-g-t] lib/igt_kmod: Allow some leeway in igt_kmod_unload_r
> 
> Hi Jonathan,
> 
> On 2023-01-09 at 11:25:36 -0800, Jonathan Cavitt wrote:
> > kmod_module_remove_module occasionally returns EAGAIN for mei_gsc
> > in the setup of some gem_lmem_swapping subtests.  Just because
> > EAGAIN is returned doesn't mean the module is lost and unable to
> > be unloaded.  Try again some number of times (currently 10) before
> > giving up.
> > 
> > Retries will occur for -EBUSY and -EAGAIN, as these imply the
> > system is waiting for the target module can simply be waited for.
> > All other errors exit immediately, but as the context for each
> > error informs their relative severity, no warnings will be issued.
> > 
> > References: VLK-30287, VLK-39629
> - ^ --------- ^
> Please remove any refs not reachable from internet.
> 
> > 
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > CC: Stuart Summers <stuart.summers at intel.com>
> > CC: Sandeep Kumar Parupalli <sandeep.kumar.parupalli at intel.com>
> > CC: Chris Wilson <chris.p.wilson at linux.intel.com>
> > Reviewed-by: Chris Wilson <chris.p.wilson at linux.intel.com>
> > ---
> >  lib/igt_kmod.c | 41 ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 38 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
> > index 17090110c..c840f9a17 100644
> > --- a/lib/igt_kmod.c
> > +++ b/lib/igt_kmod.c
> > @@ -253,8 +253,11 @@ out:
> >  
> >  static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> >  {
> > +#define MAX_TRIES	20
> > +#define SLEEP_DURATION	500000
> >  	struct kmod_list *holders, *pos;
> > -	int err = 0;
> > +	int err, tries;
> > +	const char *mod_name = kmod_module_get_name(kmod);
> >  
> >  	holders = kmod_module_get_holders(kmod);
> >  	kmod_list_foreach(pos, holders) {
> > @@ -269,7 +272,6 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> >  		return err;
> >  
> >  	if (igt_kmod_is_loading(kmod)) {
> > -		const char *mod_name = kmod_module_get_name(kmod);
> 
> Please describe this change, it looks unrelated.

The debug/info messages at the end of this function print the mod_name, so we need to initialize it at the
start of the test, rather than just when igt_kmod_is_loading.

> 
> >  		igt_debug("%s still initializing\n", mod_name);
> >  		err = igt_wait(!igt_kmod_is_loading(kmod), 10000, 100);
> >  		if (err < 0) {
> > @@ -279,7 +281,40 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> >  		}
> >  	}
> >  
> > -	return kmod_module_remove_module(kmod, flags);
> > +	for (tries = 0; tries < MAX_TRIES; tries++) {
> > +		bool loop = false;
> > +
> > +		err = kmod_module_remove_module(kmod, flags);
> > +
> > +		/* Only loop in the following cases */
> > +		switch (err) {
> > +		case -EBUSY:
> > +		case -EAGAIN:
> > +			/* Waiting for module to be available */
> > +			loop = true;
> > +			break;
> > +		default:
> > +			break;
> > +                }
> > +		if (!loop)
> > +			break;
> 
> May you turn switch into if() ? Now it looks convoluted.

This was designed to be easily extendable for additional cases where we want to wait for
kmod_module_remove_module to succeed, or if certain other error cases needed
additional management.  I'll change it over to this:

		loop = err == -EBUSY || err == -EAGAIN;

I don't foresee this condition needing to be extended, but if it does, this line might get
cumbersomely large.

> 
> > +
> > +		igt_debug("Module %s failed to unload with err: %d on attempt: %i\n",
> > +			  mod_name, err, tries + 1);
> > +
> > +		usleep(SLEEP_DURATION);
> > +	}
> > +
> > +	if (err == -ENOENT)
> > +		igt_debug("Module %s could not be found.  Skipping.\n", mod_name);
> ------------------------------------------------------- ^^^^^^^^^^
> Please remove /  Skipping./, you print debugs and you will not
> decide what caller will do.
> 
> One more error would be if driver is compiled in, so it
> just can not be removed (will it be -ENOTSUP ?).

All other errors should be caught below.  We just don't inform on -ENOENT because
that's an expected condition for builtins we won't be fixing, so we issue an igt_debug
instead.  In other words, we don't inform the user of -ENOENT errors because the
desired end result for the user is still achieved (module X is not loaded).  By
comparison, it sounds like the -ENOTSUP case keeps the module loaded, so we should
probably inform the user the module did not unload.

If this is incorrect, and -ENOTSUP is in the same class of error as -ENOENT, I can add
that to the set of errors we don't directly inform on.

> 
> > +	else if (err)
> > +		igt_info("Module %s failed to unload with err: %d after ~%.1fms\n",
> --------------- ^
> Please make them all into igt_info or igt_debug.

Every error other than -ENOENT (for reasons explained above) is a genuine issue
that should be warned on, but issuing a full warning isn't desirable in most cases,
so we simply inform instead.  We cannot hide these issues behind igt_debug.

Additionally, below, the module taking more than one attempt to unload is a
genuine issue the user should be made privy to, so we shouldn't hide it
behind igt_debug either.

It was argued to me in the past that we could just remove the -ENOENT igt_debug
statement entirely.  I'll do that and see how that's received, and if it's received
positively, the reason for why igt_debug is used in the -ENOENT case and nowhere
else is a moot distinction.
-Jonathan Cavitt

> 
> > +			 mod_name, err, SLEEP_DURATION*tries/1000.);
> > +	else if (tries)
> > +		igt_info("Module %s unload took ~%.1fms over %i attempts\n",
> --------------- ^
> Same here, please be consistent.
> 
> Regards,
> Kamil
> 
> > +			 mod_name, SLEEP_DURATION*tries/1000., tries + 1);
> > +
> > +	return err;
> >  }
> >  
> >  /**
> > -- 
> > 2.25.1
> > 
> 


More information about the igt-dev mailing list