[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again

shaoyunl shaoyun.liu at amd.com
Sun Nov 14 17:53:59 UTC 2021


In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu at amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 42b2cc999434..bcc8980d77e0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1228,12 +1228,14 @@ static int stop_cpsch(struct device_queue_manager *dqm)
 	if (!dqm->is_hws_hang)
 		unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
 	hanging = dqm->is_hws_hang || dqm->is_resetting;
-	dqm->sched_running = false;
 
-	pm_release_ib(&dqm->packet_mgr);
+	if (dqm->sched_running) {
+		dqm->sched_running = false;
+		pm_release_ib(&dqm->packet_mgr);
+		kfd_gtt_sa_free(dqm->dev, dqm->fence_mem);
+		pm_uninit(&dqm->packet_mgr, hanging);
+	}
 
-	kfd_gtt_sa_free(dqm->dev, dqm->fence_mem);
-	pm_uninit(&dqm->packet_mgr, hanging);
 	dqm_unlock(dqm);
 
 	return 0;
-- 
2.17.1



More information about the amd-gfx mailing list