[PATCH] drm/amdgpu: fix check order ras->in_recovery is earlier than ras feature
Bob Zhou
bob.zhou at amd.com
Fri Oct 27 10:02:44 UTC 2023
Checking ras->in_recovery is earlier than ras feature that causes the
below null pointer issue. So update the check order to fix it.
BUG: kernel NULL pointer dereference, address: 00000000000000e8
RIP: 0010:amdgpu_ras_reset_error_count+0xf6/0x190 [amdgpu]
Call Trace:
<TASK>
? show_regs+0x72/0x90
? __die+0x25/0x80
? page_fault_oops+0x79/0x190
? do_user_addr_fault+0x30c/0x640
? __wake_up_klogd.part.0+0x40/0x70
? exc_page_fault+0x81/0x1b0
? asm_exc_page_fault+0x27/0x30
? amdgpu_ras_reset_error_count+0xf6/0x190 [amdgpu]
? __pfx_gmc_v9_0_late_init+0x10/0x10 [amdgpu]
gmc_v9_0_late_init+0x97/0xe0 [amdgpu]
Fixes: be5c7eb10406 ("drm/amdgpu: bypass RAS error reset in some conditions")
Signed-off-by: Bob Zhou <bob.zhou at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 303fbb6a48b6..3af50754800d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1229,15 +1229,15 @@ int amdgpu_ras_reset_error_count(struct amdgpu_device *adev,
return -EOPNOTSUPP;
}
+ if (!amdgpu_ras_is_supported(adev, block) ||
+ !amdgpu_ras_get_mca_debug_mode(adev))
+ return -EOPNOTSUPP;
+
/* skip ras error reset in gpu reset */
if ((amdgpu_in_reset(adev) || atomic_read(&ras->in_recovery)) &&
mca_funcs && mca_funcs->mca_set_debug_mode)
return -EOPNOTSUPP;
- if (!amdgpu_ras_is_supported(adev, block) ||
- !amdgpu_ras_get_mca_debug_mode(adev))
- return -EOPNOTSUPP;
-
if (block_obj->hw_ops->reset_ras_error_count)
block_obj->hw_ops->reset_ras_error_count(adev);
--
2.34.1
More information about the amd-gfx
mailing list