[Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats

Mon Jul 29 12:50:00 UTC 2019

Quoting Chris Wilson (2019-07-29 12:45:52)
> Quoting Joonas Lahtinen (2019-07-29 10:26:47)
> > Ok, so just confirming here. The plan is still to have userspace set a
> > per context (or per request) time limit for expected completion of a
> > request. This will be useful for the media workloads that consume
> > deterministic amount of time for correct bitstream. And the userspace
> > wants to be notified much quicker than the generic hangcheck time if
> > the operation failed due to corrupt bitstream.
> > 
> > This time limit can be set to infinite by compute workloads.
> 
> That only provides a cap on the context itself.

Yes.

> We also have the
> criteria that is something else has been selected to run on the GPU, you
> have to allow preemption within a certain period or else you will be
> shot.

This is what I meant ...

> > Then, in parallel to that, we have cgroups or system wide configuration
> > for maximum allowed timeslice per process/context. That means that a
> > long-running workload must pre-empt at that granularity.

... with this.

> Not quite. It must preempt within a few ms of being asked, that is a
> different problem to the timeslice granularity (which is when we ask it
> to switch, or if not due to a high priority request earlier). It's a QoS
> issue for the other context. Setting that timeout is hard, we can allow
> a context to select its own timeout, or define it via sysfs/cgroups, but
> because it depends on the running context, it causes another context to
> fail in non-trivial ways. The GPU is simply not as preemptible as one
> would like.

Right, I was only thinking about the pre-emption delay, maybe I chose my
words wrong. Basically what the admin wants to control is exactly what
you wrote, how long it can take from pre-emption request to completion.
This is probably useful as a CONTEXT_GETPARAM for userspace to consider.
They might decide how many loops to run without MI_ARB_CHECK in
non-pre-emptible sections. Dunno.

That parameter configures the QoS level of the system, how fast a high
priority requests gets to run on the hardware.

> Fwiw, I was thinking the next step would be to put per-engine controls
> in sysfs, then cross the cgroups bridge. I'm not sure my previous plan
> of exposing per-context parameters for timeslice/preemption is that
> usable.

The prefered frequency of how often a context would like to be scheduled
on the hardware, makes sense as context setparam. Compute workloads are
probably indifferent and something related to media most likely wants to
run at some multiple of video FPS.

I guess the userspace could really only request being run more
frequently than the default, and in exchange it would receive less
execution time per each slice. We probably want to control the upper
bound of the frequency.

> > That pre-emption/hearbeat should happen regardless if others contexts are
> > requesting the hardware or not, because better start recovery of a hung
> > task as soon as it misbehaves.
> 
> I concur, but Jon would like the opposite to allow for uncooperative
> compute kernels that simply block preemption forever. I think for the
> extreme Jon wants, something like CPU-isolation fits better, where the
> important client owns an engine all to itself and the kernel is not even
> allowed to do housekeeping on that engine. (We would turn off time-
> slicing, preemption timers, etc on that engine and basically run it in
> submission order.)

Makes sense to me.

Regards, Joonas