[Intel-gfx] [PATCH] drm/i915/gt: reset RING_HEAD during intel_gt_unset_wedged
Tejas Upadhyay
tejaskumarx.surendrakumar.upadhyay at intel.com
Mon Dec 20 12:00:30 UTC 2021
During repeated wedged-unwedged, it is
found that i915_request_retire zaps the old
request with 0x6b6b6b6b.
On unwedged, we write a new request at RING_TAIL,
expecting to start executuing from that position,
but execution resumes from RING_HEAD (preserved
from an earlier wakeup before wedging) and
consumes the 0x6b.
Resetting kernel/user context setup enables
RING_HEAD to use RING_TAIL for submitting new
requests which resolves issue. Normally this reset
is applied when unpinning a user context, or for
kernel_contexts upon waking up the device. But fast
wedged-unwedged sequence will keep the device awake,
preserving the RING_HEAD from before.
Testcase: igt at gem_eio@unwedge-stress
Note : Current user impact is assessed to be low, as
this only affects intel_gt_unset_wedged which is
currently only used during testing and upon suspend
resume (where the device was already flushed and will
reset the kernel_contexts on waking up). In the
future though, this will present an issue for PCI
error recovery.
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay at intel.com>
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 8 ++++++++
drivers/gpu/drm/i915/gt/intel_reset.c | 3 +++
2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 352254e001b4..7e1c561bce69 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1455,9 +1455,17 @@ void intel_engines_reset_default_submission(struct intel_gt *gt)
enum intel_engine_id id;
for_each_engine(engine, gt, id) {
+ struct intel_context *ce = engine->kernel_context;
+
if (engine->sanitize)
engine->sanitize(engine);
+ /* Reset RING_HEAD so we don't consume the old
+ * poisoned request on unwedging
+ */
+ if (ce)
+ ce->ops->reset(ce);
+
engine->set_default_submission(engine);
}
}
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 7be0002d9d70..1c26e936e699 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -961,6 +961,9 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
}
spin_unlock(&timelines->lock);
+ /* Ensure that all non-kernel contexts are unpinned as well */
+ intel_gt_retire_requests(gt);
+
/* We must reset pending GPU events before restoring our submission */
ok = !HAS_EXECLISTS(gt->i915); /* XXX better agnosticism desired */
if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
--
2.31.1
More information about the Intel-gfx
mailing list