[Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads

John Harrison john.c.harrison at intel.com
Wed Feb 23 02:22:28 UTC 2022

On 2/22/2022 01:53, Tvrtko Ursulin wrote:
> On 18/02/2022 21:33, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>> Compute workloads are inherently not pre-emptible on current hardware.
>> Thus the pre-emption timeout was disabled as a workaround to prevent
>> unwanted resets. Instead, the hang detection was left to the heartbeat
>> and its (longer) timeout. This is undesirable with GuC submission as
>> the heartbeat is a full GT reset rather than a per engine reset and so
>> is much more destructive. Instead, just bump the pre-emption timeout
> Can we have a feature request to allow asking GuC for an engine reset?
For what purpose?

GuC manages the scheduling of contexts across engines. With virtual 
engines, the KMD has no knowledge of which engine a context might be 
executing on. Even without virtual engines, the KMD still has no 
knowledge of which context is currently executing on any given engine at 
any given time.

There is a reason why hang detection should be left to the entity that 
is doing the scheduling. Any other entity is second guessing at best.

The reason for keeping the heartbeat around even when GuC submission is 
enabled is for the case where the KMD/GuC have got out of sync with 
either other somehow or GuC itself has just crashed. I.e. when no 
submission at all is working and we need to reset the GuC itself and 
start over.



> Tvrtko
>> to a big value. Also, update the heartbeat to allow such a long
>> pre-emption delay in the final heartbeat period.
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>> John Harrison (3):
>>    drm/i915/guc: Limit scheduling properties to avoid overflow
>>    drm/i915/gt: Make the heartbeat play nice with long pre-emption
>>      timeouts
>>    drm/i915: Improve long running OCL w/a for GuC submission
>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 37 +++++++++++++++++--
>>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 16 ++++++++
>>   drivers/gpu/drm/i915/gt/sysfs_engines.c       | 14 +++++++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  9 +++++
>>   4 files changed, 73 insertions(+), 3 deletions(-)

