[PATCH i-g-t] lib/amdgpu: Handle -ENODATA in amdgpu_wait_memory
Jesse.zhang@amd.com
jesse.zhang at amd.com
Wed Jan 15 07:03:59 UTC 2025
The amdgpu_wait_memory function currently asserts if the return value
is non-zero and not -ECANCELED. However, -ENODATA is also a valid
error code that can be returned during GPU job timeout recovery,
particularly for queue resets. This patch updates the function to
also accept -ENODATA as a non-fatal error condition.
This change aligns with recent updates in the AMDGPU kernel driver
where -ENODATA is used to indicate queue-specific resets during
timeout recovery, while -ECANCELED or -ETIME is used for full GPU
resets. For more details, see the kernel discussion:
https://lists.freedesktop.org/archives/amd-gfx/2025-January/118795.html
Cc: Vitaly Prosyak <vitaly.prosyak at amd.com>
Cc: Christian Koenig <christian.koenig at amd.com>
Cc: Alexander Deucher <alexander.deucher at amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang at amd.com>
---
lib/amdgpu/amd_deadlock_helpers.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/amdgpu/amd_deadlock_helpers.c b/lib/amdgpu/amd_deadlock_helpers.c
index 8ac6abf8f..f274a6365 100644
--- a/lib/amdgpu/amd_deadlock_helpers.c
+++ b/lib/amdgpu/amd_deadlock_helpers.c
@@ -142,7 +142,7 @@ amdgpu_wait_memory(amdgpu_device_handle device_handle, unsigned int ip_type, uin
job_count++;
} while (r == 0 && job_count < MAX_JOB_COUNT);
- if (r != 0 && r != -ECANCELED)
+ if (r != 0 && r != -ECANCELED && r != -ENODATA)
igt_assert(0);
@@ -156,7 +156,7 @@ amdgpu_wait_memory(amdgpu_device_handle device_handle, unsigned int ip_type, uin
r = amdgpu_cs_query_fence_status(&fence_status, AMDGPU_TIMEOUT_INFINITE, 0,
&expired);
- if (r != 0 && r != -ECANCELED)
+ if (r != 0 && r != -ECANCELED && r != -ENODATA)
igt_assert(0);
/* send signal to modify the memory we wait for */
--
2.25.1
More information about the igt-dev
mailing list