[PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

Lou, Wentao Wentao.Lou at amd.com
Tue Dec 11 04:53:47 UTC 2018


Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev,
either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused  cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18.

BR,
Wentao

-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu at amd.com> 
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <Wentao.Lou at amd.com>; amd-gfx at lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: Lou, Wentao <Wentao.Lou at amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

But KFD still need to be notified during reset , the  pre_reset call to KFD will let KFD have  a chance to suspend all the  running process queues.  Was the reset works normally on SRIOV before the refactor change for  XGMI support ?  We shouldn't change the logic . 

Regards
shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx at lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou at amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
 	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
-	/* Block kfd */
-	amdgpu_amdkfd_pre_reset(adev);
+	/* Block kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)  {
-	/*unlock kfd */
-	amdgpu_amdkfd_post_reset(adev);
+	/*unlock kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
 	amdgpu_vf_error_trans_all(adev);
 	adev->in_gpu_reset = 0;
 	mutex_unlock(&adev->lock_reset);
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list