[PATCH] drm/amdgpu: update ras support capability with different sram ecc configuration

Zhang, Hawking Hawking.Zhang at amd.com
Wed Mar 11 02:28:54 UTC 2020


[AMD Official Use Only - Internal Distribution Only]

Hi Guchun,

I would suggest we organized the amdgpu_ras_check_supported in following logic

1). Disallow sriov guest/vf driver.
2). Only include ASIC families that has server skus
3). Check HBM ECC flag
	a). explicitly inform users on the availability of this capability
	b). if HBM ECC is not supported, disable UMC/DF RAS in amdgpu_ras_mask
4). Check SRAM ECC flag
	a). explicitly inform users on the availability of this capability
	b). if SRAM ECC flag is not supported, disable other IP Blocks in amdgpu_ras_mask
5). Remove the redundant RAS atombios query in gmc_v9_0_late_init for VEGA20/ARCTURUS
	a). for Vega10 (legacy RAS), we have to keep inform user on RAS capability and apply DF workaround
	b). we can try to merge vega10 as well but that can be next step.

Regards,
Hawking

-----Original Message-----
From: Chen, Guchun <Guchun.Chen at amd.com> 
Sent: Wednesday, March 11, 2020 09:57
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Clements, John <John.Clements at amd.com>
Cc: Chen, Guchun <Guchun.Chen at amd.com>
Subject: [PATCH] drm/amdgpu: update ras support capability with different sram ecc configuration

When sram ecc is disabled by vbios, ras initialization process in the corrresponding IPs that suppport sram ecc needs to be skipped. So update ras support capability accordingly on top of this configuration. This capability will block further ras operations to the unsupported IPs.

Signed-off-by: Guchun Chen <guchun.chen at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 69b02b9d4131..79be004378fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1748,8 +1748,23 @@ static void amdgpu_ras_check_supported(struct amdgpu_device *adev,
 			 amdgpu_atomfirmware_sram_ecc_supported(adev)))
 		*hw_supported = AMDGPU_RAS_BLOCK_MASK;
 
-	*supported = amdgpu_ras_enable == 0 ?
-				0 : *hw_supported & amdgpu_ras_mask;
+	if (amdgpu_ras_enable == 0)
+		*supported = 0;
+	else {
+		*supported = *hw_supported;
+		/*
+		 * When sram ecc is disabled in vbios, bypass those IP
+		 * blocks that support sram ecc, and only hold UMC and DF.
+		 */
+		if (!amdgpu_atomfirmware_sram_ecc_supported(adev)) {
+			DRM_INFO("Bypass IPs that support sram ecc.\n");
+			*supported &= (1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/* ras support needs to align with module parmeter */
+		*supported &= amdgpu_ras_mask;
+	}
 }
 
 int amdgpu_ras_init(struct amdgpu_device *adev)
--
2.17.1


More information about the amd-gfx mailing list