[PATCH] drm/amdgpu: update ras support capability with different sram ecc configuration
Zhang, Hawking
Hawking.Zhang at amd.com
Wed Mar 11 02:33:29 UTC 2020
[AMD Official Use Only - Internal Distribution Only]
Oops, update the format to make it more readable.
1. Disallow sriov guest/vf driver.
2. Only include ASIC families that has server skus
3. disable all the IP block RAS if amdgpu_ras_enable == 0
4. Check HBM ECC flag
a. explicitly inform users on the availability of this capability
b. if HBM ECC is not supported, disable UMC/DF RAS in amdgpu_ras_mask
5. Check SRAM ECC flag
a. explicitly inform users on the availability of this capability
b. if SRAM ECC flag is not supported, disable other IP Blocks in amdgpu_ras_mask
6. Remove the redundant RAS atombios query in gmc_v9_0_late_init for VEGA20/ARCTURUS
a. for Vega10 (legacy RAS), we have to keep inform user on RAS capability and apply DF workaround
b. we can try to merge vega10 as well but that can be next step.
Regards,
Hawking
-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang at amd.com>
Sent: Wednesday, March 11, 2020 10:31
To: Zhang, Hawking <Hawking.Zhang at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; amd-gfx at lists.freedesktop.org; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Clements, John <John.Clements at amd.com>
Subject: RE: [PATCH] drm/amdgpu: update ras support capability with different sram ecc configuration
[AMD Official Use Only - Internal Distribution Only]
Add one more check.
1). Disallow sriov guest/vf driver.
2). Only include ASIC families that has server skus 3). disable all the IP block RAS if amdgpu_ras_enable == 0 4). Check HBM ECC flag
a). explicitly inform users on the availability of this capability
b). if HBM ECC is not supported, disable UMC/DF RAS in amdgpu_ras_mask 5). Check SRAM ECC flag
a). explicitly inform users on the availability of this capability
b). if SRAM ECC flag is not supported, disable other IP Blocks in amdgpu_ras_mask 6). Remove the redundant RAS atombios query in gmc_v9_0_late_init for VEGA20/ARCTURUS
a). for Vega10 (legacy RAS), we have to keep inform user on RAS capability and apply DF workaround
b). we can try to merge vega10 as well but that can be next step.
Regards,
Hawking
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org<mailto:amd-gfx-bounces at lists.freedesktop.org>> On Behalf Of Zhang, Hawking
Sent: Wednesday, March 11, 2020 10:29
To: Chen, Guchun <Guchun.Chen at amd.com<mailto:Guchun.Chen at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Li, Dennis <Dennis.Li at amd.com<mailto:Dennis.Li at amd.com>>; Zhou1, Tao <Tao.Zhou1 at amd.com<mailto:Tao.Zhou1 at amd.com>>; Clements, John <John.Clements at amd.com<mailto:John.Clements at amd.com>>
Subject: RE: [PATCH] drm/amdgpu: update ras support capability with different sram ecc configuration
[AMD Official Use Only - Internal Distribution Only]
Hi Guchun,
I would suggest we organized the amdgpu_ras_check_supported in following logic
1). Disallow sriov guest/vf driver.
2). Only include ASIC families that has server skus 3). Check HBM ECC flag
a). explicitly inform users on the availability of this capability
b). if HBM ECC is not supported, disable UMC/DF RAS in amdgpu_ras_mask 4). Check SRAM ECC flag
a). explicitly inform users on the availability of this capability
b). if SRAM ECC flag is not supported, disable other IP Blocks in amdgpu_ras_mask 5). Remove the redundant RAS atombios query in gmc_v9_0_late_init for VEGA20/ARCTURUS
a). for Vega10 (legacy RAS), we have to keep inform user on RAS capability and apply DF workaround
b). we can try to merge vega10 as well but that can be next step.
Regards,
Hawking
-----Original Message-----
From: Chen, Guchun <Guchun.Chen at amd.com<mailto:Guchun.Chen at amd.com>>
Sent: Wednesday, March 11, 2020 09:57
To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang at amd.com<mailto:Hawking.Zhang at amd.com>>; Li, Dennis <Dennis.Li at amd.com<mailto:Dennis.Li at amd.com>>; Zhou1, Tao <Tao.Zhou1 at amd.com<mailto:Tao.Zhou1 at amd.com>>; Clements, John <John.Clements at amd.com<mailto:John.Clements at amd.com>>
Cc: Chen, Guchun <Guchun.Chen at amd.com<mailto:Guchun.Chen at amd.com>>
Subject: [PATCH] drm/amdgpu: update ras support capability with different sram ecc configuration
When sram ecc is disabled by vbios, ras initialization process in the corrresponding IPs that suppport sram ecc needs to be skipped. So update ras support capability accordingly on top of this configuration. This capability will block further ras operations to the unsupported IPs.
Signed-off-by: Guchun Chen <guchun.chen at amd.com<mailto:guchun.chen at amd.com>>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 69b02b9d4131..79be004378fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1748,8 +1748,23 @@ static void amdgpu_ras_check_supported(struct amdgpu_device *adev,
amdgpu_atomfirmware_sram_ecc_supported(adev)))
*hw_supported = AMDGPU_RAS_BLOCK_MASK;
- *supported = amdgpu_ras_enable == 0 ?
- 0 : *hw_supported & amdgpu_ras_mask;
+ if (amdgpu_ras_enable == 0)
+ *supported = 0;
+ else {
+ *supported = *hw_supported;
+ /*
+ * When sram ecc is disabled in vbios, bypass those IP
+ * blocks that support sram ecc, and only hold UMC and DF.
+ */
+ if (!amdgpu_atomfirmware_sram_ecc_supported(adev)) {
+ DRM_INFO("Bypass IPs that support sram ecc.\n");
+ *supported &= (1 << AMDGPU_RAS_BLOCK__UMC |
+ 1 << AMDGPU_RAS_BLOCK__DF);
+ }
+
+ /* ras support needs to align with module parmeter */
+ *supported &= amdgpu_ras_mask;
+ }
}
int amdgpu_ras_init(struct amdgpu_device *adev)
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Chawking.zhang%40amd.com%7C3d2355f98f2444a8327808d7c563f58f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637194905433994263&sdata=tAAbGn2gNN05yUL%2FRIyn%2BSbUcIhu4lUQbcw6YO6cfd0%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200311/1b6681e5/attachment-0001.htm>
More information about the amd-gfx
mailing list