[PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds
Kamil Konieczny
kamil.konieczny at linux.intel.com
Tue Feb 11 09:21:08 UTC 2025
Hi Peter,
On 2025-02-06 at 16:21:47 +0100, Peter Senna Tschudin wrote:
> Commit ddfde25f16ba ("runner: Add support for aborting on network
> failure") introduced a 20 second deadline for the DUT’s network
> to recover after a suspend/resume cycle. If the network isn’t
> back up within that time, igt_runner aborts the test run to save logs
> and prevent potential log loss from an imminent power cycle.
>
> This deadline was set to accommodate our internal CI system, which
> checks for DUT network connectivity every 5 seconds and retries up
> to 3 times at 20 second intervals. If it fails 3 consecutive checks,
This is a little confusing, you wrote in first paragraph about
20 second deadline and here it looks like 60 seconds (3*20).
> it triggers a power cycle on the DUT.
>
> Although our internal CI system can be configured with a longer
-------------- ^^^^^^^^
Remove this.
> wait time, extending it further would unnecessarily prolong tests
> in cases of DUT hangs.
>
> Bumping the deadline to 40 seconds keeps the abort mechanism safely
imho this should be option for igt-runner, I would prefer to not
adjust it later, let CI team tune it. Option could be either time
or retry counter or both.
> within our internal CI system retry window while improving chances
> of preventing a premature abort. For upstream testing on Jenkins,
> the deadlines vary from 16 and 25 minutes, and this change has
> no impact.
>
> CC: juha-pekka.heikkila at intel.com
> CC: katarzyna.piecielska at intel.com
> CC: ryszard.knop at intel.com
> CC: ewelina.musial at intel.com
> CC: adrinael at adrinael.net
> CC: mateusz.grabski at intel.com
> CC: konrad.b.brodzik at intel.com
imho better here 'Cc:'
Regards,
Kamil
> Signed-off-by: Peter Senna Tschudin <peter.senna at linux.intel.com>
> ---
> runner/executor.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/runner/executor.c b/runner/executor.c
> index 999e7f719..2abb18732 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -218,11 +218,11 @@ static bool load_ping_config_from_env(void)
>
> /*
> * On some hosts, getting network back up after suspend takes
> - * upwards of 10 seconds. 20 seconds should be enough to see
> + * upwards of 10 seconds. 40 seconds should be enough to see
> * if network comes back at all, and hopefully not too long to
> * make external monitoring freak out.
> */
> -#define PING_ABORT_DEADLINE 20
> +#define PING_ABORT_DEADLINE 40
>
> static bool can_ping(void)
> {
> --
> 2.34.1
>
More information about the igt-dev
mailing list