[PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds

Kamil Konieczny kamil.konieczny at linux.intel.com
Tue Feb 11 09:21:08 UTC 2025


Hi Peter,
On 2025-02-06 at 16:21:47 +0100, Peter Senna Tschudin wrote:
> Commit ddfde25f16ba ("runner: Add support for aborting on network
> failure") introduced a 20 second deadline for the DUT’s network
> to recover after a suspend/resume cycle. If the network isn’t
> back up within that time, igt_runner aborts the test run to save logs
> and prevent potential log loss from an imminent power cycle.
> 
> This deadline was set to accommodate our internal CI system, which
> checks for DUT network connectivity every 5 seconds and retries up
> to 3 times at 20 second intervals. If it fails 3 consecutive checks,

This is a little confusing, you wrote in first paragraph about
20 second deadline and here it looks like 60 seconds (3*20).

> it triggers a power cycle on the DUT.
> 
> Although our internal CI system can be configured with a longer
-------------- ^^^^^^^^
Remove this.

> wait time, extending it further would unnecessarily prolong tests
> in cases of DUT hangs.
> 
> Bumping the deadline to 40 seconds keeps the abort mechanism safely

imho this should be option for igt-runner, I would prefer to not
adjust it later, let CI team tune it. Option could be either time
or retry counter or both.

> within our internal CI system retry window while improving chances
> of preventing a premature abort. For upstream testing on Jenkins,
> the deadlines vary from 16 and 25 minutes, and this change has
> no impact.
> 
> CC: juha-pekka.heikkila at intel.com
> CC: katarzyna.piecielska at intel.com
> CC: ryszard.knop at intel.com
> CC: ewelina.musial at intel.com
> CC: adrinael at adrinael.net
> CC: mateusz.grabski at intel.com
> CC: konrad.b.brodzik at intel.com

imho better here 'Cc:'

Regards,
Kamil

> Signed-off-by: Peter Senna Tschudin <peter.senna at linux.intel.com>
> ---
>  runner/executor.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/runner/executor.c b/runner/executor.c
> index 999e7f719..2abb18732 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -218,11 +218,11 @@ static bool load_ping_config_from_env(void)
>  
>  /*
>   * On some hosts, getting network back up after suspend takes
> - * upwards of 10 seconds. 20 seconds should be enough to see
> + * upwards of 10 seconds. 40 seconds should be enough to see
>   * if network comes back at all, and hopefully not too long to
>   * make external monitoring freak out.
>   */
> -#define PING_ABORT_DEADLINE 20
> +#define PING_ABORT_DEADLINE 40
>  
>  static bool can_ping(void)
>  {
> -- 
> 2.34.1
> 


More information about the igt-dev mailing list