[igt-dev] [PATCH i-g-t 1/2] runner: Refactor timeouting
Petri Latvala
petri.latvala at intel.com
Tue Feb 18 09:08:33 UTC 2020
On Mon, Feb 17, 2020 at 04:50:41PM +0200, Petri Latvala wrote:
> Instead of aiming for inactivity_timeout and splitting that into
> suitable intervals for watchdog pinging, replace the whole logic with
> one-second select() timeouts and checking if we're reaching a timeout
> condition based on current time and the time passed since a particular
> event, be it the last activity or the time of signaling the child
> processes.
>
> With the refactoring, we gain a couple of new features for free:
>
> - use-watchdog now makes sense even without
> inactivity-timeout. Previously use-watchdog was silently ignored if
> inactivity-timeout was not set. Now, watchdogs will be used always if
> configured so, effectively ensuring the device gets rebooted if
> userspace dies without other timeout tracking.
>
> - Killing tests early on kernel taint now happens even
> earlier. Previously on an inactive system we possibly waited for some
> tens of seconds before checking kernel taints.
>
> Signed-off-by: Petri Latvala <petri.latvala at intel.com>
> ---
> runner/executor.c | 224 +++++++++++++++++++++++-----------------------
> 1 file changed, 113 insertions(+), 111 deletions(-)
>
> diff --git a/runner/executor.c b/runner/executor.c
> index 3ea5d167..33610c9e 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -93,7 +93,7 @@ static void init_watchdogs(struct settings *settings)
>
> memset(&watchdogs, 0, sizeof(watchdogs));
>
> - if (!settings->use_watchdog || settings->inactivity_timeout <= 0)
> + if (!settings->use_watchdog)
> return;
>
> if (settings->log_level >= LOG_LEVEL_VERBOSE) {
> @@ -672,6 +672,69 @@ static void show_kernel_task_state(void)
> sysrq('t');
> }
>
> +static const char *need_to_timeout(struct settings *settings,
> + int killed,
> + unsigned long taints,
> + double time_since_activity,
> + double time_since_kill)
> +{
> + if (killed) {
> + /*
> + * Timeout after being killed is a hardcoded amount
> + * depending on which signal we already used. The
> + * exception is SIGKILL which just immediately bails
> + * out if the kernel is tainted, because there's
> + * little to no hope of the process dying gracefully
> + * or at all.
> + *
> + * Note that if killed == SIGKILL, the caller needs
> + * special handling anyway and should ignore the
> + * actual string returned.
> + */
> + const double kill_timeout = killed == SIGKILL ? 20.0 : 120.0;
Executing this code in my head a few times I realized that before this
patch, while we did have the exact same values for the timeout, we
waited forever for a killed test to die as long as it (or the kernel)
produced output within that time. Now we don't. I consider that a
bugfix.
--
Petri Latvala
More information about the igt-dev
mailing list