[PATCH] drm/amdgpu: Decode deferred error type in gfx aca bank parser
Liu, Xiang(Dean)
Xiang.Liu at amd.com
Thu Mar 20 06:33:26 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
Thanks, will improve it.
Best Regards,
Dean
________________________________
From: Zhang, Hawking <Hawking.Zhang at amd.com>
Sent: Thursday, March 20, 2025 2:22 PM
To: Liu, Xiang(Dean) <Xiang.Liu at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Cc: Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
Subject: RE: [PATCH] drm/amdgpu: Decode deferred error type in gfx aca bank parser
[AMD Official Use Only - AMD Internal Distribution Only]
+ bank->aca_err_type = (ACA_REG__STATUS__POISON(status) ||
+ ACA_REG__STATUS__DEFERRED(status)) ?
+ ACA_ERROR_TYPE_DEFERRED :
+ ACA_ERROR_TYPE_UE;
Does it make more sense to create a macro similar to ACA_BANK_ERR_CE_DE_DECODE for above code segment?
Regards,
Hawking
-----Original Message-----
From: Liu, Xiang(Dean) <Xiang.Liu at amd.com>
Sent: Thursday, March 20, 2025 14:15
To: amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>; Liu, Xiang(Dean) <Xiang.Liu at amd.com>
Subject: [PATCH] drm/amdgpu: Decode deferred error type in gfx aca bank parser
In the case of injecting uncorrected error with background workload, the deferred error among uncorrected errors need to be specified by checking the deferred and poison bits of status register.
Signed-off-by: Xiang Liu <xiang.liu at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 3 +++ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 11 +++++++----
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index ffd4c64e123c..3f45a600f547 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -541,6 +541,9 @@ static int __aca_get_error_data(struct amdgpu_device *adev, struct aca_handle *h
if (ret)
return ret;
+ if (type == ACA_ERROR_TYPE_UE)
+ aca_log_aca_error(handle, ACA_ERROR_TYPE_DEFERRED, err_data);
+
return aca_log_aca_error(handle, type, err_data); }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index c0de682b7774..b21d784a7f9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -876,7 +876,7 @@ static int gfx_v9_4_3_aca_bank_parser(struct aca_handle *handle,
void *data)
{
struct aca_bank_info info;
- u64 misc0;
+ u64 misc0, status;
u32 instlo;
int ret;
@@ -890,12 +890,15 @@ static int gfx_v9_4_3_aca_bank_parser(struct aca_handle *handle,
info.die_id = instlo == mmSMNAID_XCD0_MCA_SMU ? 0 : 1;
misc0 = bank->regs[ACA_REG_IDX_MISC0];
+ status = bank->regs[ACA_REG_IDX_STATUS];
switch (type) {
case ACA_SMU_TYPE_UE:
- bank->aca_err_type = ACA_ERROR_TYPE_UE;
- ret = aca_error_cache_log_bank_error(handle, &info,
- ACA_ERROR_TYPE_UE, 1ULL);
+ bank->aca_err_type = (ACA_REG__STATUS__POISON(status) ||
+ ACA_REG__STATUS__DEFERRED(status)) ?
+ ACA_ERROR_TYPE_DEFERRED :
+ ACA_ERROR_TYPE_UE;
+ ret = aca_error_cache_log_bank_error(handle, &info,
+bank->aca_err_type, 1ULL);
break;
case ACA_SMU_TYPE_CE:
bank->aca_err_type = ACA_BANK_ERR_CE_DE_DECODE(bank);
--
2.34.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250320/c7ed122d/attachment.htm>
More information about the amd-gfx
mailing list