[PATCH] drm/amdkfd: Reset GPU on queue preemption failure

Harish Kasiviswanathan Harish.Kasiviswanathan at amd.com
Tue Mar 26 20:02:06 UTC 2024


Currently, with F32 HWS GPU reset is only when unmap queue fails.

However, if compute queue doesn't repond to preemption request in time
unmap will return without any error. In this case, only preemption error
is logged and Reset is not triggered. Call GPU reset in this case also.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan at amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 151fabf84040..c08b6ee25289 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2000,6 +2000,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
 	if (mqd_mgr->check_preemption_failed(mqd_mgr, dqm->packet_mgr.priv_queue->queue->mqd)) {
 		while (halt_if_hws_hang)
 			schedule();
+		kfd_hws_hang(dqm);
 		return -ETIME;
 	}
 
-- 
2.34.1



More information about the amd-gfx mailing list