[Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats

Mon Jul 29 16:38:49 UTC 2019

> -----Original Message-----
> From: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Sent: Monday, July 29, 2019 5:50 AM
> To: Bloomfield, Jon <jon.bloomfield at intel.com>; intel-
> gfx at lists.freedesktop.org; Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Ursulin, Tvrtko <tvrtko.ursulin at intel.com>
> Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats
> 
> Quoting Chris Wilson (2019-07-29 12:45:52)
> > Quoting Joonas Lahtinen (2019-07-29 10:26:47)
> > > Ok, so just confirming here. The plan is still to have userspace set a
> > > per context (or per request) time limit for expected completion of a
> > > request. This will be useful for the media workloads that consume
> > > deterministic amount of time for correct bitstream. And the userspace
> > > wants to be notified much quicker than the generic hangcheck time if
> > > the operation failed due to corrupt bitstream.
> > >
> > > This time limit can be set to infinite by compute workloads.
> >
> > That only provides a cap on the context itself.

We need to make sure that proposals such as the above are compatible with GuC. The nice thing about the heartbeat is that it relies on a more or less standard request/context and so should be compatible with any back end.

> 
> Yes.
> 
> > We also have the
> > criteria that is something else has been selected to run on the GPU, you
> > have to allow preemption within a certain period or else you will be
> > shot.
> 
> This is what I meant ...
> 
> > > Then, in parallel to that, we have cgroups or system wide configuration
> > > for maximum allowed timeslice per process/context. That means that a
> > > long-running workload must pre-empt at that granularity.
> 
> ... with this.
> 
> > Not quite. It must preempt within a few ms of being asked, that is a
> > different problem to the timeslice granularity (which is when we ask it
> > to switch, or if not due to a high priority request earlier). It's a QoS
> > issue for the other context. Setting that timeout is hard, we can allow
> > a context to select its own timeout, or define it via sysfs/cgroups, but
> > because it depends on the running context, it causes another context to
> > fail in non-trivial ways. The GPU is simply not as preemptible as one
> > would like.
> 
> Right, I was only thinking about the pre-emption delay, maybe I chose my
> words wrong. Basically what the admin wants to control is exactly what
> you wrote, how long it can take from pre-emption request to completion.
> This is probably useful as a CONTEXT_GETPARAM for userspace to consider.
> They might decide how many loops to run without MI_ARB_CHECK in
> non-pre-emptible sections. Dunno.
> 
> That parameter configures the QoS level of the system, how fast a high
> priority requests gets to run on the hardware.
> 
> > Fwiw, I was thinking the next step would be to put per-engine controls
> > in sysfs, then cross the cgroups bridge. I'm not sure my previous plan
> > of exposing per-context parameters for timeslice/preemption is that
> > usable.
> 
> The prefered frequency of how often a context would like to be scheduled
> on the hardware, makes sense as context setparam. Compute workloads are
> probably indifferent and something related to media most likely wants to
> run at some multiple of video FPS.
> 
> I guess the userspace could really only request being run more
> frequently than the default, and in exchange it would receive less
> execution time per each slice. We probably want to control the upper
> bound of the frequency.
> 
> > > That pre-emption/hearbeat should happen regardless if others contexts are
> > > requesting the hardware or not, because better start recovery of a hung
> > > task as soon as it misbehaves.
> >
> > I concur, but Jon would like the opposite to allow for uncooperative
> > compute kernels that simply block preemption forever. I think for the

I wasn't asking that :-) What I was originally asking for is to allow a compute workload to run forever IF no other contexts need to run. Don't launch a pre-emptive strike, only kill it if it actually blocks a real workload. But then I realized that this is bad for deterministic behaviour. So retracted the ask.

Using the heartbeat to test the workload for pre-emptability is a good solution because it ensures a workload always fails quickly, or never fails.

> > extreme Jon wants, something like CPU-isolation fits better, where the
> > important client owns an engine all to itself and the kernel is not even
> > allowed to do housekeeping on that engine. (We would turn off time-
> > slicing, preemption timers, etc on that engine and basically run it in
> > submission order.)

Agreed, isolation is really the only way we could permit a workload to hog an engine indefinitely. This would be beneficial to some of the RTOS use cases in particular.

> 
> Makes sense to me.
> 
> Regards, Joonas