[PATCH v4 0/4] Improve anti-pre-emption w/a for compute workloads
John.C.Harrison at Intel.com
John.C.Harrison at Intel.com
Thu Sep 29 02:18:09 UTC 2022
From: John Harrison <John.C.Harrison at Intel.com>
Compute workloads are inherently not pre-emptible on current hardware.
Thus the pre-emption timeout was disabled as a workaround to prevent
unwanted resets. Instead, the hang detection was left to the heartbeat
and its (longer) timeout. This is undesirable with GuC submission as
the heartbeat is a full GT reset rather than a per engine reset and so
is much more destructive. Instead, just bump the pre-emption timeout
to a big value. Also, update the heartbeat to allow such a long
pre-emption delay in the final heartbeat period.
v2: Add clamping helpers.
v3: Remove long timeout algorithm and replace with hard coded value
(review feedback from Tvrtko). Also, fix execlist selftest failure and
fix bug in compute enabling patch related to pre-emption timeouts.
v4: Add multiple BUG_ONs to re-check already range checked values (Tvrtko)
Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
John Harrison (4):
drm/i915/guc: Limit scheduling properties to avoid overflow
drm/i915: Fix compute pre-emption w/a to apply to compute engines
drm/i915: Make the heartbeat play nice with long pre-emption timeouts
drm/i915: Improve long running compute w/a for GuC submission
drivers/gpu/drm/i915/Kconfig.profile | 26 ++++-
drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 102 +++++++++++++++---
.../gpu/drm/i915/gt/intel_engine_heartbeat.c | 19 ++++
drivers/gpu/drm/i915/gt/sysfs_engines.c | 25 +++--
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 21 ++++
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++
7 files changed, 179 insertions(+), 28 deletions(-)
--
2.37.3
More information about the dri-devel
mailing list