[PATCH] drm/i915/selftest_hangcheck: Check sanity with more patience
Andi Shyti
andi.shyti at linux.intel.com
Tue Mar 5 17:01:41 UTC 2024
Hi Janusz,
On Wed, Feb 28, 2024 at 04:24:41PM +0100, Janusz Krzysztofik wrote:
> While trying to reproduce some other issues reported by CI for i915
> hangcheck live selftest, I found them hidden behind timeout failures
> reported by igt_hang_sanitycheck -- the very first hangcheck test case
> executed.
>
> Feb 22 19:49:06 DUT1394ACMR kernel: calling mei_gsc_driver_init+0x0/0xff0 [mei_gsc] @ 121074
> Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] DRM_I915_DEBUG enabled
> Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] Cannot find any crtc or sizes
> Feb 22 19:49:06 DUT1394ACMR kernel: probe of i915.mei-gsc.768 returned 0 after 1475 usecs
> Feb 22 19:49:06 DUT1394ACMR kernel: probe of i915.mei-gscfi.768 returned 0 after 1441 usecs
> Feb 22 19:49:06 DUT1394ACMR kernel: initcall mei_gsc_driver_init+0x0/0xff0 [mei_gsc] returned 0 after 3010 usecs
> Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] DRM_I915_DEBUG_GEM enabled
> Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] DRM_I915_DEBUG_RUNTIME_PM enabled
> Feb 22 19:49:06 DUT1394ACMR kernel: i915: Performing live selftests with st_random_seed=0x4c26c048 st_timeout=500
> Feb 22 19:49:07 DUT1394ACMR kernel: i915: Running hangcheck
> Feb 22 19:49:07 DUT1394ACMR kernel: calling mei_hdcp_driver_init+0x0/0xff0 [mei_hdcp] @ 121074
> Feb 22 19:49:07 DUT1394ACMR kernel: i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
> Feb 22 19:49:07 DUT1394ACMR kernel: probe of 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04 returned 0 after 1398 usecs
> Feb 22 19:49:07 DUT1394ACMR kernel: probe of i915.mei-gsc.768-b638ab7e-94e2-4ea2-a552-d1c54b627f04 returned 0 after 97 usecs
> Feb 22 19:49:07 DUT1394ACMR kernel: initcall mei_hdcp_driver_init+0x0/0xff0 [mei_hdcp] returned 0 after 101960 usecs
> Feb 22 19:49:07 DUT1394ACMR kernel: calling mei_pxp_driver_init+0x0/0xff0 [mei_pxp] @ 121094
> Feb 22 19:49:07 DUT1394ACMR kernel: probe of 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1 returned 0 after 435 usecs
> Feb 22 19:49:07 DUT1394ACMR kernel: mei_pxp i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:03:00.0 (ops i915_pxp_tee_component_ops [i915])
> Feb 22 19:49:07 DUT1394ACMR kernel: 100ms wait for request failed on rcs0, err=-62
> Feb 22 19:49:07 DUT1394ACMR kernel: probe of i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1 returned 0 after 158425 usecs
> Feb 22 19:49:07 DUT1394ACMR kernel: initcall mei_pxp_driver_init+0x0/0xff0 [mei_pxp] returned 0 after 224159 usecs
> Feb 22 19:49:07 DUT1394ACMR kernel: i915/intel_hangcheck_live_selftests: igt_hang_sanitycheck failed with error -5
> Feb 22 19:49:07 DUT1394ACMR kernel: i915: probe of 0000:03:00.0 failed with error -5
>
> Those request waits, once timed out after 100ms, have never been
> confirmed to still persist over another 100ms, always being able to
> complete within the originally requested wait time doubled.
>
> Taking into account potentially significant additional concurrent workload
> generated by new auxiliary drivers that didn't exist before and now are
> loaded in parallel with the i915 module also when loaded in selftest mode,
> relax our expectations on time consumed by the sanity check request before
> it completes.
>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
pushed to drm-intel-gt-next.
Thank you,
Andi
More information about the Intel-gfx
mailing list