[PATCH] drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb
Zhou1, Tao
Tao.Zhou1 at amd.com
Fri Apr 10 04:18:02 UTC 2020
[AMD Official Use Only - Internal Distribution Only]
Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
> -----Original Message-----
> From: Chen, Guchun <Guchun.Chen at amd.com>
> Sent: 2020年4月10日 11:55
> To: amd-gfx at lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao
> <Tao.Zhou1 at amd.com>; Clements, John <John.Clements at amd.com>
> Cc: Chen, Guchun <Guchun.Chen at amd.com>
> Subject: [PATCH] drm/amdgpu: add uncorrectable error count print in UMC
> ecc irq cb
>
> Uncorrectable error count printing is missed when issuing UMC UE injection.
> When going to the error count log function in GPU recover work thread,
> there is no chance to get correct error count value by last error injection and
> print, because the error status register is automatically cleared after reading
> in UMC ecc irq callback. So add such message printing in UMC ecc irq cb to be
> consistent with other RAS error interrupt cases.
>
> Signed-off-by: Guchun Chen <guchun.chen at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index f4d40855147b..267f7c30f4dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -121,6 +121,9 @@ int amdgpu_umc_process_ras_data_cb(struct
> amdgpu_device *adev,
>
> /* only uncorrectable error needs gpu reset */
> if (err_data->ue_count) {
> + dev_info(adev->dev, "%ld uncorrectable errors detected in
> UMC block\n",
> + err_data->ue_count);
> +
> if (err_data->err_addr_cnt &&
> amdgpu_ras_add_bad_pages(adev, err_data->err_addr,
> err_data->err_addr_cnt))
> --
> 2.17.1
More information about the amd-gfx
mailing list