[igt-dev] [PATCH i-g-t] lib/igt_kmod: Allow some leeway in igt_kmod_unload_r
Cavitt, Jonathan
jonathan.cavitt at intel.com
Tue Jan 17 17:56:24 UTC 2023
-----Original Message-----
From: Kamil Konieczny <kamil.konieczny at linux.intel.com>
Sent: Tuesday, January 17, 2023 8:07 AM
To: igt-dev at lists.freedesktop.org
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Summers, Stuart <stuart.summers at intel.com>; Parupalli, Sandeep Kumar <sandeep.kumar.parupalli at intel.com>; Chris Wilson <chris.p.wilson at linux.intel.com>
Subject: Re: [igt-dev] [PATCH i-g-t] lib/igt_kmod: Allow some leeway in igt_kmod_unload_r
>
> Hi Jonathan,
>
> On 2023-01-09 at 11:25:36 -0800, Jonathan Cavitt wrote:
> > kmod_module_remove_module occasionally returns EAGAIN for mei_gsc
> > in the setup of some gem_lmem_swapping subtests. Just because
> > EAGAIN is returned doesn't mean the module is lost and unable to
> > be unloaded. Try again some number of times (currently 10) before
> > giving up.
> >
> > Retries will occur for -EBUSY and -EAGAIN, as these imply the
> > system is waiting for the target module can simply be waited for.
> > All other errors exit immediately, but as the context for each
> > error informs their relative severity, no warnings will be issued.
> >
> > References: VLK-30287, VLK-39629
> - ^ --------- ^
> Please remove any refs not reachable from internet.
>
> >
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > CC: Stuart Summers <stuart.summers at intel.com>
> > CC: Sandeep Kumar Parupalli <sandeep.kumar.parupalli at intel.com>
> > CC: Chris Wilson <chris.p.wilson at linux.intel.com>
> > Reviewed-by: Chris Wilson <chris.p.wilson at linux.intel.com>
> > ---
> > lib/igt_kmod.c | 41 ++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/igt_kmod.c b/lib/igt_kmod.c
> > index 17090110c..c840f9a17 100644
> > --- a/lib/igt_kmod.c
> > +++ b/lib/igt_kmod.c
> > @@ -253,8 +253,11 @@ out:
> >
> > static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> > {
> > +#define MAX_TRIES 20
> > +#define SLEEP_DURATION 500000
> > struct kmod_list *holders, *pos;
> > - int err = 0;
> > + int err, tries;
> > + const char *mod_name = kmod_module_get_name(kmod);
> >
> > holders = kmod_module_get_holders(kmod);
> > kmod_list_foreach(pos, holders) {
> > @@ -269,7 +272,6 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> > return err;
> >
> > if (igt_kmod_is_loading(kmod)) {
> > - const char *mod_name = kmod_module_get_name(kmod);
>
> Please describe this change, it looks unrelated.
The debug/info messages at the end of this function print the mod_name, so we need to initialize it at the
start of the test, rather than just when igt_kmod_is_loading.
>
> > igt_debug("%s still initializing\n", mod_name);
> > err = igt_wait(!igt_kmod_is_loading(kmod), 10000, 100);
> > if (err < 0) {
> > @@ -279,7 +281,40 @@ static int igt_kmod_unload_r(struct kmod_module *kmod, unsigned int flags)
> > }
> > }
> >
> > - return kmod_module_remove_module(kmod, flags);
> > + for (tries = 0; tries < MAX_TRIES; tries++) {
> > + bool loop = false;
> > +
> > + err = kmod_module_remove_module(kmod, flags);
> > +
> > + /* Only loop in the following cases */
> > + switch (err) {
> > + case -EBUSY:
> > + case -EAGAIN:
> > + /* Waiting for module to be available */
> > + loop = true;
> > + break;
> > + default:
> > + break;
> > + }
> > + if (!loop)
> > + break;
>
> May you turn switch into if() ? Now it looks convoluted.
This was designed to be easily extendable for additional cases where we want to wait for
kmod_module_remove_module to succeed, or if certain other error cases needed
additional management. I'll change it over to this:
loop = err == -EBUSY || err == -EAGAIN;
I don't foresee this condition needing to be extended, but if it does, this line might get
cumbersomely large.
>
> > +
> > + igt_debug("Module %s failed to unload with err: %d on attempt: %i\n",
> > + mod_name, err, tries + 1);
> > +
> > + usleep(SLEEP_DURATION);
> > + }
> > +
> > + if (err == -ENOENT)
> > + igt_debug("Module %s could not be found. Skipping.\n", mod_name);
> ------------------------------------------------------- ^^^^^^^^^^
> Please remove / Skipping./, you print debugs and you will not
> decide what caller will do.
>
> One more error would be if driver is compiled in, so it
> just can not be removed (will it be -ENOTSUP ?).
All other errors should be caught below. We just don't inform on -ENOENT because
that's an expected condition for builtins we won't be fixing, so we issue an igt_debug
instead. In other words, we don't inform the user of -ENOENT errors because the
desired end result for the user is still achieved (module X is not loaded). By
comparison, it sounds like the -ENOTSUP case keeps the module loaded, so we should
probably inform the user the module did not unload.
If this is incorrect, and -ENOTSUP is in the same class of error as -ENOENT, I can add
that to the set of errors we don't directly inform on.
>
> > + else if (err)
> > + igt_info("Module %s failed to unload with err: %d after ~%.1fms\n",
> --------------- ^
> Please make them all into igt_info or igt_debug.
Every error other than -ENOENT (for reasons explained above) is a genuine issue
that should be warned on, but issuing a full warning isn't desirable in most cases,
so we simply inform instead. We cannot hide these issues behind igt_debug.
Additionally, below, the module taking more than one attempt to unload is a
genuine issue the user should be made privy to, so we shouldn't hide it
behind igt_debug either.
It was argued to me in the past that we could just remove the -ENOENT igt_debug
statement entirely. I'll do that and see how that's received, and if it's received
positively, the reason for why igt_debug is used in the -ENOENT case and nowhere
else is a moot distinction.
-Jonathan Cavitt
>
> > + mod_name, err, SLEEP_DURATION*tries/1000.);
> > + else if (tries)
> > + igt_info("Module %s unload took ~%.1fms over %i attempts\n",
> --------------- ^
> Same here, please be consistent.
>
> Regards,
> Kamil
>
> > + mod_name, SLEEP_DURATION*tries/1000., tries + 1);
> > +
> > + return err;
> > }
> >
> > /**
> > --
> > 2.25.1
> >
>
More information about the igt-dev
mailing list