[PATCH] drm/amdgpu: Print kernel message when error logged by scrub

Zhang, Hawking Hawking.Zhang at amd.com
Fri Apr 18 08:28:24 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

It's okay to only check scrub bit so the check includes all the scenarios rather than solely for poison creation. Please also update the kernel message to "hardware error logged by the scrubber"

Regards,
Hawking

-----Original Message-----
From: Liu, Xiang(Dean) <Xiang.Liu at amd.com>
Sent: Friday, April 18, 2025 15:32
To: amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Liu, Xiang(Dean) <Xiang.Liu at amd.com>
Subject: [PATCH] drm/amdgpu: Print kernel message when error logged by scrub

Print a kernel message when the scrub bit of status register is set to indicate that errors are being logged by the scrub.

Signed-off-by: Xiang Liu <xiang.liu at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index b4ad163f42a7..2b7b3abdbfc7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -120,6 +120,10 @@ static void aca_smu_bank_dump(struct amdgpu_device *adev, int idx, int total, st
        for (i = 0; i < ARRAY_SIZE(aca_regs); i++)
                RAS_EVENT_LOG(adev, event_id, HW_ERR "ACA[%02d/%02d].%s=0x%016llx\n",
                              idx + 1, total, aca_regs[i].name, bank->regs[aca_regs[i].reg_idx]);
+
+       if (ACA_BANK_ERR_IS_DEFFERED(bank) &&
+           ACA_REG__STATUS__SCRUB(bank->regs[ACA_REG_IDX_STATUS]))
+               RAS_EVENT_LOG(adev, event_id, HW_ERR "Error logged by scrub\n");
 }

 static int aca_smu_get_valid_aca_banks(struct amdgpu_device *adev, enum aca_smu_type type,
--
2.34.1



More information about the amd-gfx mailing list