[PATCH] drm/amd/amdgpu: move kfd post_reset out of reset_sriov function

Liu, Shaoyun Shaoyun.Liu at amd.com
Mon Nov 22 15:15:52 UTC 2021


[AMD Official Use Only]

ping

-----Original Message-----
From: Liu, Shaoyun 
Sent: Thursday, November 18, 2021 10:08 PM
To: amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH] drm/amd/amdgpu: move kfd post_reset out of reset_sriov function

[AMD Official Use Only]

Ping 

-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu at amd.com> 
Sent: Thursday, November 18, 2021 11:58 AM
To: amd-gfx at lists.freedesktop.org
Cc: Liu, Shaoyun <Shaoyun.Liu at amd.com>
Subject: [PATCH] drm/amd/amdgpu: move kfd post_reset out of reset_sriov function

For sriov XGMI  configuration, the host driver will handle the hive reset, so in guest side, the reset_sriov only be called once on one device. This will make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov function to make them balance .

Signed-off-by: shaoyunl <shaoyun.liu at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 10c8008d1da0..9a9d5493c676 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4308,7 +4308,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
 
 	amdgpu_irq_gpu_reset_resume_helper(adev);
 	r = amdgpu_ib_ring_tests(adev);
-	amdgpu_amdkfd_post_reset(adev);
 
 error:
 	if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) { @@ -5081,7 +5080,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
 	tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));
 	/* Actual ASIC resets if needed.*/
-	/* TODO Implement XGMI hive reset logic for SRIOV */
+	/* Host driver will handle XGMI hive reset for SRIOV */
 	if (amdgpu_sriov_vf(adev)) {
 		r = amdgpu_device_reset_sriov(adev, job ? false : true);
 		if (r)
@@ -5141,8 +5140,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
 skip_sched_resume:
 	list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
-		/* unlock kfd: SRIOV would do it separately */
-		if (!need_emergency_restart && !amdgpu_sriov_vf(tmp_adev))
+		/* unlock kfd */
+		if (!need_emergency_restart)
 	                amdgpu_amdkfd_post_reset(tmp_adev);
 
 		/* kfd_post_reset will do nothing if kfd device is not initialized,
--
2.17.1


More information about the amd-gfx mailing list