[PATCH] drm/amdgpu: resolve mGPU RAS query instability

Zhang, Hawking Hawking.Zhang at amd.com
Tue Apr 7 03:55:03 UTC 2020


[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>

Per discussion, please have a separated patch to replace all the "DRM_INFO" with "dev_info" in per IP query_ras_error_count callback function so that we will have clear picture on which errors are from which nodes when harvest all the RAS errors in one gpu recovery worker.

Regards,
Hawking
From: Clements, John <John.Clements at amd.com>
Sent: Tuesday, April 7, 2020 11:03
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Li, Dennis <Dennis.Li at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: [PATCH] drm/amdgpu: resolve mGPU RAS query instability


[AMD Official Use Only - Internal Distribution Only]

Submitting patch to resolve issue when upon receiving an uncorrectable ras error, RAS ISR gets triggered on all GPU node creating a race condition between querying the RAS errors and entering the GPU reset sequence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200407/214cfee5/attachment-0001.htm>


More information about the amd-gfx mailing list