[PATCH i-g-t] lib/amdgpu: Handle -ENODATA in amdgpu_wait_memory
vitaly prosyak
vprosyak at amd.com
Wed Jan 15 21:05:45 UTC 2025
The change looks good to me
Reviewed-by: Vitaly.Prosyak ,<vitaly.prosyak at amd.com>
On 2025-01-15 02:03, Jesse.zhang at amd.com wrote:
> The amdgpu_wait_memory function currently asserts if the return value
> is non-zero and not -ECANCELED. However, -ENODATA is also a valid
> error code that can be returned during GPU job timeout recovery,
> particularly for queue resets. This patch updates the function to
> also accept -ENODATA as a non-fatal error condition.
>
> This change aligns with recent updates in the AMDGPU kernel driver
> where -ENODATA is used to indicate queue-specific resets during
> timeout recovery, while -ECANCELED or -ETIME is used for full GPU
> resets. For more details, see the kernel discussion:
> https://lists.freedesktop.org/archives/amd-gfx/2025-January/118795.html
>
> Cc: Vitaly Prosyak <vitaly.prosyak at amd.com>
> Cc: Christian Koenig <christian.koenig at amd.com>
> Cc: Alexander Deucher <alexander.deucher at amd.com>
>
> Signed-off-by: Jesse Zhang <jesse.zhang at amd.com>
> ---
> lib/amdgpu/amd_deadlock_helpers.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/amdgpu/amd_deadlock_helpers.c b/lib/amdgpu/amd_deadlock_helpers.c
> index 8ac6abf8f..f274a6365 100644
> --- a/lib/amdgpu/amd_deadlock_helpers.c
> +++ b/lib/amdgpu/amd_deadlock_helpers.c
> @@ -142,7 +142,7 @@ amdgpu_wait_memory(amdgpu_device_handle device_handle, unsigned int ip_type, uin
> job_count++;
> } while (r == 0 && job_count < MAX_JOB_COUNT);
>
> - if (r != 0 && r != -ECANCELED)
> + if (r != 0 && r != -ECANCELED && r != -ENODATA)
> igt_assert(0);
>
>
> @@ -156,7 +156,7 @@ amdgpu_wait_memory(amdgpu_device_handle device_handle, unsigned int ip_type, uin
>
> r = amdgpu_cs_query_fence_status(&fence_status, AMDGPU_TIMEOUT_INFINITE, 0,
> &expired);
> - if (r != 0 && r != -ECANCELED)
> + if (r != 0 && r != -ECANCELED && r != -ENODATA)
> igt_assert(0);
>
> /* send signal to modify the memory we wait for */
More information about the igt-dev
mailing list