[PATCH] drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb

Zhou1, Tao Tao.Zhou1 at amd.com
Fri Apr 10 04:18:02 UTC 2020


[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>

> -----Original Message-----
> From: Chen, Guchun <Guchun.Chen at amd.com>
> Sent: 2020年4月10日 11:55
> To: amd-gfx at lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao
> <Tao.Zhou1 at amd.com>; Clements, John <John.Clements at amd.com>
> Cc: Chen, Guchun <Guchun.Chen at amd.com>
> Subject: [PATCH] drm/amdgpu: add uncorrectable error count print in UMC
> ecc irq cb
> 
> Uncorrectable error count printing is missed when issuing UMC UE injection.
> When going to the error count log function in GPU recover work thread,
> there is no chance to get correct error count value by last error injection and
> print, because the error status register is automatically cleared after reading
> in UMC ecc irq callback. So add such message printing in UMC ecc irq cb to be
> consistent with other RAS error interrupt cases.
> 
> Signed-off-by: Guchun Chen <guchun.chen at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index f4d40855147b..267f7c30f4dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -121,6 +121,9 @@ int amdgpu_umc_process_ras_data_cb(struct
> amdgpu_device *adev,
> 
>  	/* only uncorrectable error needs gpu reset */
>  	if (err_data->ue_count) {
> +		dev_info(adev->dev, "%ld uncorrectable errors detected in
> UMC block\n",
> +			err_data->ue_count);
> +
>  		if (err_data->err_addr_cnt &&
>  		    amdgpu_ras_add_bad_pages(adev, err_data->err_addr,
>  						err_data->err_addr_cnt))
> --
> 2.17.1


More information about the amd-gfx mailing list