[PATCH] drm/amdgpu: Log RAS errors during load

Zhang, Hawking Hawking.Zhang at amd.com
Sun May 11 23:52:03 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>

Regards,
Hawking
-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar at amd.com>
Sent: Friday, May 9, 2025 23:32
To: amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: Re: [PATCH] drm/amdgpu: Log RAS errors during load

<Ping>

On 5/6/2025 4:45 PM, Lijo Lazar wrote:
> During driver load, RAS event manager may not be initialized. This
> will cause any ATHUB event during driver load to be skipped in dmesg
> log. Log the error in dmesg log for easier diagnosis.
>
> Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 82013b495436..85a70b646840 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -4485,8 +4485,11 @@ void amdgpu_ras_global_ras_isr(struct amdgpu_device *adev)
>               enum ras_event_type type = RAS_EVENT_TYPE_FATAL;
>               u64 event_id;
>
> -             if (amdgpu_ras_mark_ras_event(adev, type))
> +             if (amdgpu_ras_mark_ras_event(adev, type)) {
> +                     dev_err(adev->dev,
> +                             "uncorrectable hardware error (ERREVENT_ATHUB_INTERRUPT)
> +detected!\n");
>                       return;
> +             }
>
>               event_id = amdgpu_ras_acquire_event_id(adev, type);
>



More information about the amd-gfx mailing list