[Intel-gfx] [PATCH 19/27] drm/i915: Replace hangcheck by heartbeats
Chris Wilson
chris at chris-wilson.co.uk
Fri Sep 27 09:18:41 UTC 2019
Quoting Joonas Lahtinen (2019-09-27 09:26:52)
> Quoting Chris Wilson (2019-09-25 13:01:29)
> > Replace sampling the engine state every so often with a periodic
> > heartbeat request to measure the health of an engine. This is coupled
> > with the forced-preemption to allow long running requests to survive so
> > long as they do not block other users.
> >
> > v2: Couple in sysfs controls
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> > Reviewed-by: Jon Bloomfield <jon.bloomfield at intel.com>
>
> <SNIP>
>
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -37,3 +37,14 @@ config DRM_I915_PREEMPT_TIMEOUT
> > to execute.
> >
> > May be 0 to disable the timeout.
> > +
> > +config DRM_I915_HEARTBEAT_INTERVAL
> > + int "Interval between heartbeat pulses (ms)"
> > + default 2500 # microseconds
>
> "ms" or "us", pick one?
It began with 'm', close enough when copy'n'paste :)
> > + help
> > + While active the driver uses a periodic request, a heartbeat, to
> > + check the wellness of the GPU and to regularly flush state changes
> > + (idle barriers).
> > +
> > + May be 0 to disable heartbeats and therefore disable automatic GPU
> > + hang detection.
>
> Worth to mention this can be overridden from sysfs.
The idea of setting 0 here is to disable it at compilation and DCE.
But any other value could be overridden...
> > +static void heartbeat(struct work_struct *wrk)
> > +{
>
> <SNIP>
>
> > + if (i915_modparams.enable_hangcheck)
> > + engine->heartbeat.systole = i915_request_get(rq);
>
> I'd be more inclined to the userspace opt-in for running indefinitely and
> getting rid of the modparam completely.
That's a separate challenge. :)
> The long workloads might even not pre-empt at desired granularity.
Indeed, but as that is a qos impact on another user, I would similarly
suggest that opting out of this is akin to setting yourself to be
non-preemptable, ergo a restricted operation.
-Chris
More information about the Intel-gfx
mailing list