[PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration

Chen, Guchun Guchun.Chen at amd.com
Thu Mar 12 03:47:20 UTC 2020


[AMD Public Use]

Thanks for your suggestion, Hawking.
I will send one patch v3 to target this.

Regards,
Guchun

-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang at amd.com> 
Sent: Thursday, March 12, 2020 11:13 AM
To: Chen, Guchun <Guchun.Chen at amd.com>; amd-gfx at lists.freedesktop.org; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Clements, John <John.Clements at amd.com>
Subject: RE: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration

[AMD Official Use Only - Internal Distribution Only]

Hi Guchun,

It seems to me we still have redundant function call in amdgpu_ras_check_supported. The atomfirmware interfaces are possibly invoked twice?

As I listed the steps in last thread, we can assume hw_supported to 0 or 0xfffffff either. 

Check HBM ECC first, explicitly indicates it is present or not, and set the DF/UMC bit in hw_supported Check SRAM ECC, explicitly indicates It is present or not, and set other ip blocks masks.

After we run all above checks, set the finally ras mask to con->supported.

We'd better keep the message consistent as what we had in gmc_v9_0_late. No need to highlight the what IP block get disabled, that should be transparent to users.

Regards,
Hawking

-----Original Message-----
From: Chen, Guchun <Guchun.Chen at amd.com>
Sent: Thursday, March 12, 2020 10:55
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Clements, John <John.Clements at amd.com>
Cc: Chen, Guchun <Guchun.Chen at amd.com>
Subject: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration

When sram ecc is disabled by vbios, ras initialization process in the corrresponding IPs that suppport sram ecc needs to be skipped. So update ras support capability accordingly on top of this configuration. This capability will block further ras operations to the unsupported IPs.

v2: check HBM ECC enablement and set ras mask accordingly.

Signed-off-by: Guchun Chen <guchun.chen at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 +++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 69b02b9d4131..b08226c10d95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1748,8 +1748,41 @@ static void amdgpu_ras_check_supported (struct amdgpu_device *adev,
 			 amdgpu_atomfirmware_sram_ecc_supported(adev)))
 		*hw_supported = AMDGPU_RAS_BLOCK_MASK;
 
-	*supported = amdgpu_ras_enable == 0 ?
-				0 : *hw_supported & amdgpu_ras_mask;
+	/* Both HBM and SRAM ECC are disabled in vbios. */
+	if (*hw_supported == 0) {
+		DRM_INFO("RAS HW support is disabled as HBM"
+			" and SRAM ECC are not presented.\n");
+		return;
+	}
+
+	if (amdgpu_ras_enable) {
+		*supported = *hw_supported;
+
+		/*
+		 * When HBM ECC is disabled in vbios, remove
+		 * UMC's and DF's ras support.
+		 */
+		if (!amdgpu_atomfirmware_mem_ecc_supported(adev)) {
+			DRM_INFO("HBM ECC is disabled and "
+					"remove UMC and DF ras support.\n");
+			*supported &= ~(1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/*
+		 * When sram ecc is disabled in vbios, bypass those IP
+		 * blocks that support sram ecc, and only hold UMC and DF.
+		 */
+		if (!amdgpu_atomfirmware_sram_ecc_supported(adev)) {
+			DRM_INFO("SRAM ECC is disabled and remove ras support "
+					"from IPs that support sram ecc.\n");
+			*supported &= (1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/* ras support needs to align with module parmeter */
+		*supported &= amdgpu_ras_mask;
+	}
 }
 
 int amdgpu_ras_init(struct amdgpu_device *adev)
--
2.17.1


More information about the amd-gfx mailing list