[PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
Lou, Wentao
Wentao.Lou at amd.com
Tue Dec 11 04:53:47 UTC 2018
Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev,
either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18.
BR,
Wentao
-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu at amd.com>
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <Wentao.Lou at amd.com>; amd-gfx at lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: Lou, Wentao <Wentao.Lou at amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
But KFD still need to be notified during reset , the pre_reset call to KFD will let KFD have a chance to suspend all the running process queues. Was the reset works normally on SRIOV before the refactor change for XGMI support ? We shouldn't change the logic .
Regards
shaoyun.liu
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx at lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou at amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
mutex_lock(&adev->lock_reset);
atomic_inc(&adev->gpu_reset_counter);
adev->in_gpu_reset = 1;
- /* Block kfd */
- amdgpu_amdkfd_pre_reset(adev);
+ /* Block kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_pre_reset(adev);
}
static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) {
- /*unlock kfd */
- amdgpu_amdkfd_post_reset(adev);
+ /*unlock kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_post_reset(adev);
amdgpu_vf_error_trans_all(adev);
adev->in_gpu_reset = 0;
mutex_unlock(&adev->lock_reset);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list