[PATCH AUTOSEL 6.11 28/32] drm/xe: Enlarge the invalidation timeout from 150 to 500

Sasha Levin sashal at kernel.org
Mon Oct 28 10:50:10 UTC 2024


From: Shuicheng Lin <shuicheng.lin at intel.com>

[ Upstream commit c8fb95e7a54315460b45090f0968167a332e1657 ]

There are error messages like below that are occurring during stress
testing: "[   31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global
invalidation timeout". Previously it was hitting this 3 out of 1000
executions of warm reboot.  After raising it to 500, 1000 warm reboot
executions passed and it didn't fail.

Due to the way xe_mmio_wait32() is implemented, the timeout is able to
expire early when the register matches the expected value due to the
wait increments starting small. So, the larger timeout value should have
no effect during normal use cases.

v2 (Jonathan):
  - rework the commit message
v3 (Lucas):
  - add conclusive message for the fail rate and test case
v4:
  - add suggested-by

Suggested-by: Jia Yao <jia.yao at intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin at intel.com>
Cc: Lucas De Marchi <lucas.demarchi at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: Nirmoy Das <nirmoy.das at intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
Tested-by: Zongyao Bai <zongyao.bai at intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das at intel.com>
Signed-off-by: Matthew Auld <matthew.auld at intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015161207.1373401-1-shuicheng.lin@intel.com
(cherry picked from commit 2eb460ab9f4bc5b575f52568d17936da0af681d8)
[ Fix conflict with gt->mmio ]
Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---
 drivers/gpu/drm/xe/xe_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 8a44a2b6dcbb6..5226333cfdd6d 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -870,7 +870,7 @@ void xe_device_l2_flush(struct xe_device *xe)
 	spin_lock(&gt->global_invl_lock);
 	xe_mmio_write32(gt, XE2_GLOBAL_INVAL, 0x1);
 
-	if (xe_mmio_wait32(gt, XE2_GLOBAL_INVAL, 0x1, 0x0, 150, NULL, true))
+	if (xe_mmio_wait32(gt, XE2_GLOBAL_INVAL, 0x1, 0x0, 500, NULL, true))
 		xe_gt_err_once(gt, "Global invalidation timeout\n");
 	spin_unlock(&gt->global_invl_lock);
 
-- 
2.43.0



More information about the Intel-xe mailing list