[PATCH] drm/amdgpu: enable RAS poison flag when GPU is connected to CPU

Ziya, Mohammad zafar Mohammadzafar.Ziya at amd.com
Wed Dec 8 07:33:59 UTC 2021


[AMD Official Use Only]



>-----Original Message-----
>From: Zhou1, Tao <Tao.Zhou1 at amd.com>
>Sent: Wednesday, December 8, 2021 12:52 PM
>To: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>; amd-
>gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>
>Subject: RE: [PATCH] drm/amdgpu: enable RAS poison flag when GPU is
>connected to CPU
>
>[AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>
>> Sent: Wednesday, December 8, 2021 2:47 PM
>> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org;
>> Zhang, Hawking <Hawking.Zhang at amd.com>
>> Subject: RE: [PATCH] drm/amdgpu: enable RAS poison flag when GPU is
>> connected to CPU
>>
>> [AMD Official Use Only]
>>
>>
>>
>> >-----Original Message-----
>> >From: Zhou1, Tao <Tao.Zhou1 at amd.com>
>> >Sent: Wednesday, December 8, 2021 11:50 AM
>> >To: amd-gfx at lists.freedesktop.org; Zhang, Hawking
>> ><Hawking.Zhang at amd.com>; Ziya, Mohammad zafar
>> ><Mohammadzafar.Ziya at amd.com>
>> >Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
>> >Subject: [PATCH] drm/amdgpu: enable RAS poison flag when GPU is
>> >connected to CPU
>> >
>> >The RAS poison mode is enabled by default on the platform.
>> >
>> >Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
>> >---
>> > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +++++-
>> > 1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> >b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> >index a95d200adff9..0003f2c64da8 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> >@@ -2372,7 +2372,11 @@ int amdgpu_ras_init(struct amdgpu_device
>*adev)
>> > 	}
>> >
>> > 	/* Init poison supported flag, the default value is false */
>> >-	if (adev->df.funcs &&
>> >+	if (adev->gmc.xgmi.connected_to_cpu) {
>>
>> Why not considering PCIe interface connected device by default? PCIe
>> interface connected device will not see this issue?
>
>[Tao] What do you mean by "PCIe interface connected device"?
>I think the else if path can handle other platforms and the default value of
>poison_supported is false on other systems.

[zafar]: GPU device connected to CPU over PCIe link without XGMI link and VBIOS flag does not support UMC RAS error injection. 

>
>>
>> >+		/* enabled by default when GPU is connected to CPU */
>> >+		con->poison_supported = true;
>> >+	}
>> >+	else if (adev->df.funcs &&
>> > 	    adev->df.funcs->query_ras_poison_mode &&
>> > 	    adev->umc.ras_funcs &&
>> > 	    adev->umc.ras_funcs->query_ras_poison_mode) {
>> >--
>> >2.17.1


More information about the amd-gfx mailing list