[Intel-gfx] [PATCH 19/27] drm/i915: Replace hangcheck by heartbeats

Chris Wilson chris at chris-wilson.co.uk
Fri Sep 27 09:18:41 UTC 2019


Quoting Joonas Lahtinen (2019-09-27 09:26:52)
> Quoting Chris Wilson (2019-09-25 13:01:29)
> > Replace sampling the engine state every so often with a periodic
> > heartbeat request to measure the health of an engine. This is coupled
> > with the forced-preemption to allow long running requests to survive so
> > long as they do not block other users.
> > 
> > v2: Couple in sysfs controls
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> > Reviewed-by: Jon Bloomfield <jon.bloomfield at intel.com>
> 
> <SNIP>
> 
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -37,3 +37,14 @@ config DRM_I915_PREEMPT_TIMEOUT
> >           to execute.
> >  
> >           May be 0 to disable the timeout.
> > +
> > +config DRM_I915_HEARTBEAT_INTERVAL
> > +       int "Interval between heartbeat pulses (ms)"
> > +       default 2500 # microseconds
> 
> "ms" or "us", pick one?

It began with 'm', close enough when copy'n'paste :)

> > +       help
> > +         While active the driver uses a periodic request, a heartbeat, to
> > +         check the wellness of the GPU and to regularly flush state changes
> > +         (idle barriers).
> > +
> > +         May be 0 to disable heartbeats and therefore disable automatic GPU
> > +         hang detection.
> 
> Worth to mention this can be overridden from sysfs.

The idea of setting 0 here is to disable it at compilation and DCE.

But any other value could be overridden...

> > +static void heartbeat(struct work_struct *wrk)
> > +{
> 
> <SNIP>
> 
> > +       if (i915_modparams.enable_hangcheck)
> > +               engine->heartbeat.systole = i915_request_get(rq);
> 
> I'd be more inclined to the userspace opt-in for running indefinitely and
> getting rid of the modparam completely.

That's a separate challenge. :)
 
> The long workloads might even not pre-empt at desired granularity.

Indeed, but as that is a qos impact on another user, I would similarly
suggest that opting out of this is akin to setting yourself to be
non-preemptable, ergo a restricted operation.
-Chris


More information about the Intel-gfx mailing list