[Intel-gfx] [RFC 0/8] Force preemption

Thu Mar 22 19:59:33 UTC 2018

> From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf
> Of Jeff McGee
> Sent: Thursday, March 22, 2018 12:09 PM
> To: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Kondapally, Kalyan <kalyan.kondapally at intel.com>; intel-
> gfx at lists.freedesktop.org; ben at bwidawsk.net
> Subject: Re: [Intel-gfx] [RFC 0/8] Force preemption
> 
> On Thu, Mar 22, 2018 at 05:41:57PM +0000, Tvrtko Ursulin wrote:
> >
> > On 22/03/2018 16:01, Jeff McGee wrote:
> > >On Thu, Mar 22, 2018 at 03:57:49PM +0000, Tvrtko Ursulin wrote:
> > >>
> > >>On 22/03/2018 14:34, Jeff McGee wrote:
> > >>>On Thu, Mar 22, 2018 at 09:28:00AM +0000, Chris Wilson wrote:
> > >>>>Quoting Tvrtko Ursulin (2018-03-22 09:22:55)
> > >>>>>
> > >>>>>On 21/03/2018 17:26, jeff.mcgee at intel.com wrote:
> > >>>>>>From: Jeff McGee <jeff.mcgee at intel.com>
> > >>>>>>
> > >>>>>>Force preemption uses engine reset to enforce a limit on the time
> > >>>>>>that a request targeted for preemption can block. This feature is
> > >>>>>>a requirement in automotive systems where the GPU may be
> shared by
> > >>>>>>clients of critically high priority and clients of low priority that
> > >>>>>>may not have been curated to be preemption friendly. There may
> be
> > >>>>>>more general applications of this feature. I'm sharing as an RFC to
> > >>>>>>stimulate that discussion and also to get any technical feedback
> > >>>>>>that I can before submitting to the product kernel that needs this.
> > >>>>>>I have developed the patches for ease of rebase, given that this is
> > >>>>>>for the moment considered a non-upstreamable feature. It would
> be
> > >>>>>>possible to refactor hangcheck to fully incorporate force
> preemption
> > >>>>>>as another tier of patience (or impatience) with the running
> request.
> > >>>>>
> > >>>>>Sorry if it was mentioned elsewhere and I missed it - but does this
> work
> > >>>>>only with stateless clients - or in other words, what would happen to
> > >>>>>stateful clients which would be force preempted? Or the answer is
> we
> > >>>>>don't care since they are misbehaving?
> > >>>>
> > >>>>They get notified of being guilty for causing a gpu reset; three strikes
> > >>>>and they are out (banned from using the gpu) using the current rules.
> > >>>>This is a very blunt hammer that requires the rest of the system to be
> > >>>>robust; one might argue time spent making the system robust would
> be
> > >>>>better served making sure that the timer never expired in the first
> place
> > >>>>thereby eliminating the need for a forced gpu reset.
> > >>>>-Chris
> > >>>
> > >>>Yes, for simplication the policy applied to force preempted contexts
> > >>>is the same as for hanging contexts. It is known that this feature
> > >>>should not be required in a fully curated system. It's a requirement
> > >>>if end user will be alllowed to install 3rd party apps to run in the
> > >>>non-critical domain.
> > >>
> > >>My concern is whether it safe to call this force _preemption_, while
> > >>it is not really expected to work as preemption from the point of
> > >>view of preempted context. I may be missing some angle here, but I
> > >>think a better name would include words like maximum request
> > >>duration or something.
> > >>
> > >>I can see a difference between allowed maximum duration when there
> > >>is something else pending, and when it isn't, but I don't
> > >>immediately see that we should consider this distinction for any
> > >>real benefit?
> > >>
> > >>So should the feature just be "maximum request duration"? This would
> > >>perhaps make it just a special case of hangcheck, which ignores head
> > >>progress, or whatever we do in there.
> > >>
> > >>Regards,
> > >>
> > >>Tvrtko
> > >
> > >I think you might be unclear about how this works. We're not starting a
> > >preemption to see if we can cleanly remove a request who has begun to
> > >exceed its normal time slice, i.e. hangcheck. This is about bounding
> > >the time that a normal preemption can take. So first start preemption
> > >in response to higher-priority request arrival, then wait for preemption
> > >to complete within a certain amount of time. If it does not, resort to
> > >reset.
> > >
> > >So it's really "force the resolution of a preemption", shortened to
> > >"force preemption".
> >
> > You are right, I veered off in my thinking and ended up with
> > something different. :)
> >
> > I however still think the name is potentially misleading, since the
> > request/context is not getting preempted. It is getting effectively
> > killed (sooner or later, directly or indirectly).
> >
> > Maybe that is OK for the specific use case when everything is only
> > broken and not malicious.
> >
> > In a more general purpose system it would be a bit random when
> > something would work, and when it wouldn't, depending on system
> > setup and even timings.
> >
> > Hm, maybe you don't even really benefit from the standard three
> > strikes and you are out policy, and for this specific use case, you
> > should just kill it straight away. If it couldn't be preempted once,
> > why pay the penalty any more?
> >
> > If you don't have it already, devising a solution which blacklists
> > the process (if it creates more contexts), or even a parent (if
> > forking is applicable and implementation feasible), for offenders
> > could also be beneficial.
> >
> > Regards,
> >
> > Tvrtko
> 
> Fair enough. There wasn't a lot of deliberation on this name. We
> referred to it in various ways during development. I think I started
> using "force preemption" because it was short. "reset to preempt" was
> another phrase that was used.
> 
> The handling of the guilty client/context could be tailored more. Like
> I said it was easiest to start with the same sort of handling that we
> have already for hang scenarios. Simple is good when you are rebasing
> without much hope to upstream. :(
> 
> If there was interest in upstreaming this capability, we could certainly
> incorporate nicely within a refactoring of hangcheck. And then we
> wouldn't even need a special name for it. The whole thing could be recast
> as time slice management, where your slice is condition-based. You get
> unlimited time if no one wants the engine, X time if someone of equal
> priority wants the engine, and Y time if someone of higher priority
> wants the engine, etc. Where 'Y' is analogous to the fpreempt_timeout
> value in my RFC.
> 
On the subject of it being "a bit random when something would work":
Currently it's the well behaved app (and everything else behind it)
that pays the price for the badly-behaved umd being a bad citizen. 12s is
a really long time for the GUI to freeze before a bad context can be nuked.

Yes, with enforced pre-emption latency, the bad umd pays a price in that,
if it's lucky, it will run to completion. If unlucky it gets nuked. But it's a bad
umd at the end of the day. And we should find it and fix it. 

With fair-scheduling, this becomes even more important - all umd's need
to ensure that they are capable of pre-empting to the finest granularity
supported by the h/w for their respective workloads. If they don't
they really should be obliterated (aka detected and fixed).

The big benefit I see with the new approach (vs TDR) is that there's actually
no need to enforce the (slightly arbitrarily determined 3x4s timeout) for a
context at all, providing it plays nicely. If you want to run a really long
GPGPU workload, do so by all means, just don't hog the GPU.

BTW, the same context banning should be applied to a context nuked
by forced-preemption as with TDR. That's the intention at least. This is just
an RFC right now.