[CI 3/3] drm/xe/xe-for-ci: Check whether oom was due to ww mutex error injection
Thomas Hellström
thomas.hellstrom at linux.intel.com
Mon Jun 10 15:20:17 UTC 2024
When CONFIG_DEBUG_WW_MUTEX_SLOWPATH is enabled, which it is in CI, but
not in production kernels, an injected -EDEADLK error will, due to
limitations in TTM, cause false OOM notifications.
Check whether the OOM was likely caused by an -EDEADLK injection and in
that case, rerun the validation.
Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 3399c7e5bf4d..4d10049a962e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -337,6 +337,19 @@ static void xe_vm_kill(struct xe_vm *vm, bool unlocked)
/* TODO: Inform user the VM is banned */
}
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+
+static bool xe_exec_contention_injected(struct drm_exec *exec)
+{
+ return !!exec->ticket.contending_lock;
+}
+
+#else
+
+#define xe_exec_contention_injected(_a) (false)
+
+#endif
+
/**
* xe_vm_validate_should_retry() - Whether to retry after a validate error.
* @exec: The drm_exec object used for locking before validation.
@@ -356,7 +369,10 @@ static void xe_vm_kill(struct xe_vm *vm, bool unlocked)
*/
bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, bool *exclusive)
{
- if (err != -ENOMEM || *exclusive)
+ if (err == -ENOMEM && *exclusive && xe_exec_contention_injected(exec))
+ return true;
+
+ if (err != -ENOMEM || *exclusive)
return false;
*exclusive = true;
--
2.44.0
More information about the Intel-xe
mailing list