[PATCH] drm/amdgpu: Clear garbage data in err_data before usage

Gu, JiaWei (Will) JiaWei.Gu at amd.com
Thu Jan 6 10:22:15 UTC 2022


[AMD Official Use Only]

Via ras_ctrl sys node one uncorrectable error injection on Sienna Cichlid, two interrupts will be triggered.
I was informed the two interrupts are as expected since when error address is not 64byte aligned, one 64Byte SDP request will be split to two 32Byte request in UMC and sent to dram

Then the second interrupt handling will read the garbage data in err_data.
And the consequence is that ue counter increased by 2, and page at 0x0 address will be saved unexpectedly.

Best regards,
Jiawei  

-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1 at amd.com> 
Sent: Thursday, January 6, 2022 6:05 PM
To: Gu, JiaWei (Will) <JiaWei.Gu at amd.com>; amd-gfx at lists.freedesktop.org; Clements, John <John.Clements at amd.com>; Yang, Stanley <Stanley.Yang at amd.com>; Deng, Emily <Emily.Deng at amd.com>
Cc: Gu, JiaWei (Will) <JiaWei.Gu at amd.com>
Subject: RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage

[AMD Official Use Only]

Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>

May I know how do you reproduce the issue?

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of 
> Jiawei Gu
> Sent: Thursday, January 6, 2022 5:17 PM
> To: amd-gfx at lists.freedesktop.org; Clements, John 
> <John.Clements at amd.com>; Yang, Stanley <Stanley.Yang at amd.com>; Deng, 
> Emily <Emily.Deng at amd.com>
> Cc: Gu, JiaWei (Will) <JiaWei.Gu at amd.com>
> Subject: [PATCH] drm/amdgpu: Clear garbage data in err_data before 
> usage
> 
> Memory of err_data should be cleaned before usage when there're 
> multiple entry in ras ih.
> Otherwise garbage data from last loop will be used.
> 
> Signed-off-by: Jiawei Gu <Jiawei.Gu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 31bad1a20ed0..3f5bf5780ebf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct
> ras_manager *obj)
>  				/* Let IP handle its data, maybe we need get the output
>  				 * from the callback to udpate the error type/count, etc
>  				 */
> +				memset(&err_data, 0, sizeof(err_data));
>  				ret = data->cb(obj->adev, &err_data, &entry);
>  				/* ue will trigger an interrupt, and in that case
>  				 * we need do a reset to recovery the whole system.
> --
> 2.17.1


More information about the amd-gfx mailing list