[PATCH 3/3] drm/amdgpu: report bad status in GPU recovery
Zhang, Hawking
Hawking.Zhang at amd.com
Thu Aug 1 10:30:26 UTC 2024
[AMD Official Use Only - AMD Internal Distribution Only]
Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
Regards,
Hawking
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao Zhou
Sent: Thursday, August 1, 2024 18:00
To: amd-gfx at lists.freedesktop.org
Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: [PATCH 3/3] drm/amdgpu: report bad status in GPU recovery
Instead of printing GPU reset failed.
v2: add check for reset_context->src.
Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5d49f70704c6..7b21243c7c55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5921,8 +5921,14 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
tmp_adev->asic_reset_res = 0;
if (r) {
- /* bad news, how to tell it to userspace ? */
- dev_info(tmp_adev->dev, "GPU reset(%d) failed\n", atomic_read(&tmp_adev->gpu_reset_counter));
+ /* bad news, how to tell it to userspace ?
+ * for ras error, we should report GPU bad status instead of
+ * reset failure
+ */
+ if (reset_context->src != AMDGPU_RESET_SRC_RAS ||
+ !amdgpu_ras_eeprom_check_err_threshold(tmp_adev))
+ dev_info(tmp_adev->dev, "GPU reset(%d) failed\n",
+ atomic_read(&tmp_adev->gpu_reset_counter));
amdgpu_vf_error_put(tmp_adev, AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0, r);
} else {
dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n", atomic_read(&tmp_adev->gpu_reset_counter));
--
2.34.1
More information about the amd-gfx
mailing list