[Intel-gfx] [RFC 0/8] Force preemption

Thu Mar 22 15:57:49 UTC 2018

On 22/03/2018 14:34, Jeff McGee wrote:
> On Thu, Mar 22, 2018 at 09:28:00AM +0000, Chris Wilson wrote:
>> Quoting Tvrtko Ursulin (2018-03-22 09:22:55)
>>>
>>> On 21/03/2018 17:26, jeff.mcgee at intel.com wrote:
>>>> From: Jeff McGee <jeff.mcgee at intel.com>
>>>>
>>>> Force preemption uses engine reset to enforce a limit on the time
>>>> that a request targeted for preemption can block. This feature is
>>>> a requirement in automotive systems where the GPU may be shared by
>>>> clients of critically high priority and clients of low priority that
>>>> may not have been curated to be preemption friendly. There may be
>>>> more general applications of this feature. I'm sharing as an RFC to
>>>> stimulate that discussion and also to get any technical feedback
>>>> that I can before submitting to the product kernel that needs this.
>>>> I have developed the patches for ease of rebase, given that this is
>>>> for the moment considered a non-upstreamable feature. It would be
>>>> possible to refactor hangcheck to fully incorporate force preemption
>>>> as another tier of patience (or impatience) with the running request.
>>>
>>> Sorry if it was mentioned elsewhere and I missed it - but does this work
>>> only with stateless clients - or in other words, what would happen to
>>> stateful clients which would be force preempted? Or the answer is we
>>> don't care since they are misbehaving?
>>
>> They get notified of being guilty for causing a gpu reset; three strikes
>> and they are out (banned from using the gpu) using the current rules.
>> This is a very blunt hammer that requires the rest of the system to be
>> robust; one might argue time spent making the system robust would be
>> better served making sure that the timer never expired in the first place
>> thereby eliminating the need for a forced gpu reset.
>> -Chris
> 
> Yes, for simplication the policy applied to force preempted contexts
> is the same as for hanging contexts. It is known that this feature
> should not be required in a fully curated system. It's a requirement
> if end user will be alllowed to install 3rd party apps to run in the
> non-critical domain.

My concern is whether it safe to call this force _preemption_, while it 
is not really expected to work as preemption from the point of view of 
preempted context. I may be missing some angle here, but I think a 
better name would include words like maximum request duration or something.

I can see a difference between allowed maximum duration when there is 
something else pending, and when it isn't, but I don't immediately see 
that we should consider this distinction for any real benefit?

So should the feature just be "maximum request duration"? This would 
perhaps make it just a special case of hangcheck, which ignores head 
progress, or whatever we do in there.

Regards,

Tvrtko