[Intel-gfx] [PATCH v2] drm/i915/selftests: Bump the scheduling error threshold for fast heartbeats
Mika Kuoppala
mika.kuoppala at linux.intel.com
Wed Jan 13 16:00:09 UTC 2021
Chris Wilson <chris at chris-wilson.co.uk> writes:
> Since we are system_highpri_wq, we expected the heartbeat to be
> scheduled promptly. However, we see delays of over 10ms upsetting our
> assertions. Accept this as inevitable and bump the minimum error
> threshold to 20ms (from 6 jiffies).
>
> <6> [616.784749] rcs0: Heartbeat delay: 3570us [2802, 9188]
> <6> [616.807790] bcs0: Heartbeat delay: 2111us [745, 4372]
> <6> [616.853776] vcs0: Heartbeat delay: 6485us [2424, 11637]
> <3> [616.859296] vcs0: Heartbeat delay was 6485us, expected less than 6000us
> <3> [616.860901] i915/intel_heartbeat_live_selftests: live_heartbeat_fast failed with error -22
>
> v2: More context from CI.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
> drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> index b88aa35ad75b..223ab88f7e57 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> @@ -197,6 +197,7 @@ static int cmp_u32(const void *_a, const void *_b)
>
> static int __live_heartbeat_fast(struct intel_engine_cs *engine)
> {
> + const unsigned int error_threshold = max(20000u, jiffies_to_usecs(6));
> struct intel_context *ce;
> struct i915_request *rq;
> ktime_t t0, t1;
> @@ -254,12 +255,18 @@ static int __live_heartbeat_fast(struct intel_engine_cs *engine)
> times[0],
> times[ARRAY_SIZE(times) - 1]);
>
> - /* Min work delay is 2 * 2 (worst), +1 for scheduling, +1 for slack */
> - if (times[ARRAY_SIZE(times) / 2] > jiffies_to_usecs(6)) {
> + /*
> + * Ideally, the upper bound on min work delay would be something like
> + * 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we
> + * are, even with system_wq_highpri, at the mercy of the CPU scheduler
> + * and may be stuck behind some slow work for many millisecond. Such
> + * as our very own display workers.
> + */
> + if (times[ARRAY_SIZE(times) / 2] > error_threshold) {
> pr_err("%s: Heartbeat delay was %uus, expected less than %dus\n",
> engine->name,
> times[ARRAY_SIZE(times) / 2],
> - jiffies_to_usecs(6));
> + error_threshold);
> err = -EINVAL;
> }
>
> --
> 2.20.1
More information about the Intel-gfx
mailing list