[Intel-gfx] [PATCH v4 3/4] drm/i915: Make the heartbeat play nice with long pre-emption timeouts

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Sep 30 09:22:01 UTC 2022


On 29/09/2022 17:21, John Harrison wrote:
> On 9/29/2022 00:42, Tvrtko Ursulin wrote:
>> On 29/09/2022 03:18, John.C.Harrison at Intel.com wrote:
>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>
>>> Compute workloads are inherently not pre-emptible for long periods on
>>> current hardware. As a workaround for this, the pre-emption timeout
>>> for compute capable engines was disabled. This is undesirable with GuC
>>> submission as it prevents per engine reset of hung contexts. Hence the
>>> next patch will re-enable the timeout but bumped up by an order of
>>> magnitude.
>>>
>>> However, the heartbeat might not respect that. Depending upon current
>>> activity, a pre-emption to the heartbeat pulse might not even be
>>> attempted until the last heartbeat period. Which means that only one
>>> period is granted for the pre-emption to occur. With the aforesaid
>>> bump, the pre-emption timeout could be significantly larger than this
>>> heartbeat period.
>>>
>>> So adjust the heartbeat code to take the pre-emption timeout into
>>> account. When it reaches the final (high priority) period, it now
>>> ensures the delay before hitting reset is bigger than the pre-emption
>>> timeout.
>>>
>>> v2: Fix for selftests which adjust the heartbeat period manually.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>> ---
>>>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 19 +++++++++++++++++++
>>>   1 file changed, 19 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
>>> b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> index a3698f611f457..823a790a0e2ae 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> @@ -22,9 +22,28 @@
>>>     static bool next_heartbeat(struct intel_engine_cs *engine)
>>>   {
>>> +    struct i915_request *rq;
>>>       long delay;
>>>         delay = READ_ONCE(engine->props.heartbeat_interval_ms);
>>> +
>>> +    rq = engine->heartbeat.systole;
>>> +
>>> +    if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
>>> +        delay == engine->defaults.heartbeat_interval_ms) {
>>
>> Maybe I forgot but what is the reason for the check against the 
>> default heartbeat interval?
> That's the 'v2: fix for selftests that manually adjust the heartbeat'. 
> If something (or someone) has explicitly set an override of the 
> heartbeat then it has to be assumed that they know what they are doing, 
> and if things don't work any more that's their problem. But if we don't 
> respect their override then they won't get the timings they expect and 
> the selftest will fail.

Isn't this a bit too strict for the non-selftest case? If the new 
concept is extending the last pulse to guarantee preemption, then I 
think we could allow tweaking of the heartbeat period. Like what if user 
wants 1s, or 10s instead of 2.5s - why would that need to break the 
improvement from this patch?

In what ways selftests fail? Are they trying to guess time to reset 
based on the hearbeat period set? If so perhaps add a helper to query it 
based on the last pulse extension.

Regards,

Tvrtko


More information about the Intel-gfx mailing list