[PATCH Review 1/1] drm/amdgpu: skip umc ras error count harvest

Zhang, Hawking Hawking.Zhang at amd.com
Tue Dec 7 06:41:17 UTC 2021


[AMD Official Use Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>

Regards,
Hawking
-----Original Message-----
From: Stanley.Yang <Stanley.Yang at amd.com> 
Sent: Tuesday, December 7, 2021 14:40
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Clements, John <John.Clements at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Li, Candice <Candice.Li at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
Cc: Yang, Stanley <Stanley.Yang at amd.com>
Subject: [PATCH Review 1/1] drm/amdgpu: skip umc ras error count harvest

remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter

Signed-off-by: Stanley.Yang <Stanley.Yang at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1043d41b6807..a95d200adff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -897,11 +897,6 @@ static void amdgpu_ras_get_ecc_info(struct amdgpu_device *adev, struct ras_err_d
 	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
 	int ret = 0;
 
-	/* skip get ecc info during gpu recovery */
-	if (atomic_read(&ras->in_recovery) == 1 &&
-		adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2))
-		return;
-
 	/*
 	 * choosing right query method according to
 	 * whether smu support query error information @@ -1752,6 +1747,16 @@ static void amdgpu_ras_log_on_err_counter(struct amdgpu_device *adev)
 		if (info.head.block == AMDGPU_RAS_BLOCK__PCIE_BIF)
 			continue;
 
+		/*
+		 * this is a workaround for aldebaran, skip send msg to
+		 * smu to get ecc_info table due to smu handle get ecc
+		 * info table failed temporarily.
+		 * should be removed until smu fix handle ecc_info table.
+		 */
+		if ((info.head.block == AMDGPU_RAS_BLOCK__UMC) &&
+			(adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2)))
+			continue;
+
 		amdgpu_ras_query_error_status(adev, &info);
 	}
 }
--
2.17.1


More information about the amd-gfx mailing list