[Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts

Thu Mar 3 19:09:22 UTC 2022

On 3/3/2022 01:55, Tvrtko Ursulin wrote:
> On 02/03/2022 17:55, John Harrison wrote:
>
>>> I was assuming 2.5s tP is enough and basing all calculation on that. 
>>> Heartbeat or timeslicing regardless. I thought we established 
>>> neither of us knows how long is enough.
>>>
>>> Are you now saying 2.5s is definitely not enough? How is that usable 
>>> for a default out of the box desktop?
>> Show me your proof that 2.5s is enough.
>>
>> 7.5s is what we have been using internally for a very long time. It 
>> has approval from all relevant parties. If you wish to pick a new 
>> random number then please provide data to back it up along with buy 
>> in from all UMD teams and project management.
>
> And upstream disabled preemption has acks from compute. "Internally" 
> is as far away from out of the box desktop experiences we have been 
> arguing about. In fact you have argued compute disables the hearbeat 
> anyway.
>
> Lets jump to the end of this reply please for actions.
>
>>> And I don't have a problem with extending the last pulse. It is 
>>> fundamentally correct to do regardless of the backend. I just raised 
>>> the question of whether to extend all heartbeats to account for 
>>> preemption (and scheduling delays). (What is the point of bumping 
>>> their priority and re-scheduling if we didn't give enough time to 
>>> the engine to react? So opposite of the question you raise.)
>> The point is that it we are giving enough time to react. Raising the 
>> priority of a pre-emption that has already been triggered will have 
>> no effect. So as long as the total time from when the pre-emption is 
>> triggered (prio becomes sufficiently high) to the point when the 
>> reset is decided is longer than the pre-emption timeout then it 
>> works. Given that, it is unnecessary to increase the intermediate 
>> periods. It has no advantage and has the disadvantage of making the 
>> total time unreasonably long.
>>
>> So again, what is the point of making every period longer? What 
>> benefit does it *actually* give?
>
> Less special casing and pointless prio bumps ahead of giving time to 
> engine to even react. You wouldn't have to have the last pulse 2 * tP 
> but normal tH + tP. So again, it is nicer for me to derive all 
> heartbeat pulses from the same input parameters.
>
> The whole "it is very long" argument is IMO moot because now proposed 
> 7.5s preempt period is I suspect wholly impractical for desktop. 
> Combined with the argument that real compute disables heartbeats 
> anyway even extra so.
The whole thing is totally fubar already. Right now pre-emption is 
totally disabled. So you are currently waiting for the entire heartbeat 
sequence to complete and then nuking the entire machine. So arguing that 
7.5s is too long is pointless. 7.5s is a big improvement over what is 
currently enabled.

And 'nice' would be having hardware that worked in a sensible manner. 
There is no nice here. There is only 'what is the least worst option'. 
And the least worst option for an end user is a long pre-emption timeout 
with a not massively long heartbeat. If that means a very slight 
complication in the heartbeat code, that is a trivial concern.

>
>> Fine. "tP(RCS) = 7500;" can I merge the patch now?
> I could live with setting preempt timeout to 7.5s. The downside is 
> slower time to restoring frozen desktops. Worst case today 5 * 2.5s, 
> with changes 4 * 2.5s + 2 * 7.5s; so from 12.5s to 25s, doubling..
But that is worst case scenario (when something much more severe than an 
application hang has occurred). Regular case would be second heartbeat 
period + pre-emption timeout and an engine only reset not a full GT 
reset. So it's still better than what we have at present.

>
> Actions:
>
> 1)
> Get a number from compute/OpenCL people for what they say is minimum 
> preempt timeout for default out of the box Linux desktop experience.
That would be the one that has been agreed upon by both linux software 
arch and all UMD teams and has been in use for the past year or more in 
the internal tree.

>
> This does not mean them running some tests and can't be bothered to 
> setup up the machine for the extreme use cases, but workloads average 
> users can realistically be expected to run.
>
> Say for instance some image manipulation software which is OpenCL 
> accelerated or similar. How long unpreemptable sections are expected 
> there. Or similar. I am not familiar what all OpenCL accelerated use 
> cases there are on Linux.
>
> And this number should be purely about minimum preempt timeout, not 
> considering heartbeats. This is because preempt timeout may kick in 
> sooner than stopped heartbeat if the user workload is low priority.
>
And driver is simply hosed in the intervening six months or more that it 
takes for the right people to find the time to do this.

Right now, it is broken. This patch set improves things. Actual numbers 
can be refined later as/when some random use case that we hadn't 
previously thought of pops up. But not fixing the basic problem at all 
until we have an absolutely perfect for all parties solution is 
pointless. Not least because there is no perfect solution. No matter 
what number you pick it is going to be wrong for someone.

2.5s, 7.5s, X.Ys, I really don't care. 2.5s is a number you seem to have 
picked out of the air totally at random, or maybe based on it being the 
heartbeat period (except that you keep arguing that basing tP on tH is 
wrong). 7.5s is a number that has been in active use for a lot of 
testing for quite some time - KMD CI, UMD CI, E2E, etc. But either way, 
the initial number is almost irrelevant as long as it is not zero. So 
can we please just get something merged now as a starting point?

> 2)
> Commit message should explain the effect on the worst case time until 
> engine reset.
>
> 3)
> OpenCL/compute should ack the change publicly as well since they acked 
> the disabling of preemption.
This patch set has already been publicly acked by the compute team. See 
the 'acked-by' tag.

>
> 4)
> I really want overflows_type in the first patch.
In the final GuC assignment? Only if it is a BUG_ON. If we get a failure 
there it is an internal driver error and cannot be corrected for. It is 
too late for any plausible range check action.

And if you mean in the the actual helper function with the rest of the 
clamping then you are bleeding internal GuC API structure details into 
non-GuC code. Plus the test would be right next to the 'if (size < 
OFFICIAL_GUC_RANGE_LIMIT)' test which just looks dumb as well as being 
redundant duplication - "if ((value < GUC_LIMIT) && (value < 
NO_WE_REALLY_MEAN_IT_GUC_LIMIT))". And putting it inside the GuC limit 
definition looks even worse "#define LIMIT min(MAX_U32, 100*1000) /* 
because the developer doesn't know how big a u32 is */".

John.

>
> My position is that with the above satisfied it is okay to merge.
>
> Regards,
>
> Tvrtko