[igt-dev] ✗ Fi.CI.IGT: failure for runner: Don't kill a test on taint if watching timeouts

Chris Wilson chris at chris-wilson.co.uk
Fri Dec 4 15:53:04 UTC 2020


Quoting Janusz Krzysztofik (2020-12-04 15:44:01)
> On Fri, 2020-12-04 at 14:35 +0000, Patchwork wrote:
> 
>     Patch Details
> 
>     Series:  runner: Don't kill a test on taint if watching timeouts
>     URL:     https://patchwork.freedesktop.org/series/84577/
>     State:   failure
>     Details: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_5249/index.html
> 
>     CI Bug Log - changes from CI_DRM_9441_full -> IGTPW_5249_full
> 
>     Summary
> 
>     FAILURE
> 
>     Serious unknown changes coming with IGTPW_5249_full absolutely need to be
>     verified manually.
> 
>     If you think the reported changes have nothing to do with the changes
>     introduced in IGTPW_5249_full, please notify your bug team to allow them
>     to document this new failure mode, which will reduce false positives in CI.
> 
>     External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_5249/
>     index.html
> 
>     Possible new issues
> 
>     Here are the unknown changes that may have been introduced in
>     IGTPW_5249_full:
> 
>     IGT changes
> 
>     Possible regressions
> 
>igt at kms_vblank@pipe-b-ts-continuation-dpms-suspend:
>           ☆ shard-iclb: PASS -> DMESG-WARN
> 
> 
> I can't believe the change in igt_runner to not abort immediately on kernel
> taint 
> could cause the driver to fail with a warning and trigger that taint.  
> There must be some other reason.
> 
> 
>     Warnings
> 
>igt at gem_exec_reloc@basic-parallel:
> 
>           ☆ shard-kbl: TIMEOUT (i915#1729) -> TIMEOUT
> 
>           ☆ shard-tglb: TIMEOUT (i915#1729) -> TIMEOUT
> 
>           ☆ shard-apl: TIMEOUT (i915#1729) -> TIMEOUT
> 
>           ☆ shard-iclb: TIMEOUT (i915#1729) -> TIMEOUT
> 
>           ☆ shard-glk: TIMEOUT (i915#1729) -> TIMEOUT
> 
> Hmm, I have no idea why output from this test is now less complete than before.
> My expectation was we should get more, not less.
> 
> Petri, do you think this may be related my implementation of the change?

       if (settings->per_test_timeout != 0 &&
-           time_since_subtest > settings->per_test_timeout)
-               return show_kernel_task_state("Per-test timeout exceeded. Killing the current test with SIGQUIT.\n");
+           time_since_subtest > settings->per_test_timeout / decrease) {
+               if (decrease > 1)
+                       return "Killing the test because the kernel is tainted.\n";
+               return "Per-test timeout exceeded. Killing the current test with SIGQUIT.\n";
+       }

        if (settings->inactivity_timeout != 0 &&
-           time_since_activity > settings->inactivity_timeout)
-               return show_kernel_task_state("Inactivity timeout exceeded. Killing the current test with SIGQUIT.\n");
+           time_since_activity > settings->inactivity_timeout / decrease ) {
+               if (decrease > 1)
+                       return "Killing the test because the kernel is tainted.\n";
+               return "Inactivity timeout exceeded. Killing the current test with SIGQUIT.\n";
+       }

The extra information was from show_kernel_task_state().
-Chris


More information about the igt-dev mailing list