[PATCH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

Deng, Emily Emily.Deng at amd.com
Tue Dec 17 10:38:41 UTC 2019

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng <Emily.Deng at amd.com>

>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Monk Liu
>Sent: Tuesday, December 17, 2019 6:20 PM
>To: amd-gfx at lists.freedesktop.org
>Cc: Liu, Monk <Monk.Liu at amd.com>
>Subject: [PATCH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV
>MEC is ruined by the amdkfd_pre_reset after VF FLR done
>amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR, the
>correct sequence is do amdkfd_pre_reset before VF FLR but there is a limitation
>to block this sequence:
>if we do pre_reset() before VF FLR, it would go KIQ way to do register access and
>stuck there, because KIQ probably won't work by that time (e.g. you already
>made GFX hang)
>so the best way right now is to simply remove it.
>Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 --
> 1 file changed, 2 deletions(-)
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 605cef6..ae962b9 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3672,8 +3672,6 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
> 	if (r)
> 		return r;
>-	amdgpu_amdkfd_pre_reset(adev);
> 	/* Resume IP prior to SMC */
> 	r = amdgpu_device_ip_reinit_early_sriov(adev);
> 	if (r)
>amd-gfx mailing list
>amd-gfx at lists.freedesktop.org

More information about the amd-gfx mailing list