[PATCH v2 1/2] drm/xe/guc/ct: Increase wait timeout for g2h response
Badal Nilawar
badal.nilawar at intel.com
Wed Oct 16 11:52:55 UTC 2024
Occasionally, the G2H worker starts running after a delay of more than
a second even after being queued and activated by the Linux workqueue
subsystem.
To prevent G2H timeout errors, the wait timeout is being increased.
v2: Add comment to describe this change with TODO (Matt B/John H)
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902
Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
Cc: Matthew Brost <matthew.brost at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: John Harrison <John.C.Harrison at Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
---
drivers/gpu/drm/xe/xe_guc_ct.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index c7673f56d413..3096baa4c9f4 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1016,7 +1016,17 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
return ret;
}
- ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
+ /*
+ * Occasionally it is seen that the G2H worker starts running after a delay of more than
+ * a second even after being queued and activated by the Linux workqueue subsystem. This
+ * leads to G2H timeout error. This is seen especially while running xe_pm and gt reset
+ * flow which uses xe_guc_ct_send_recv(). To prevent G2H timeout errors, the wait timeout
+ * is being increased.
+ *
+ * TODO: Reduce the timeout Once workqueue scheduling delay issue root caused and fixed.
+ */
+
+ ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ * 3);
/*
* Ensure we serialize with completion side to prevent UAF with fence going out of scope on
--
2.34.1
More information about the Intel-xe
mailing list