[PATCH 1/2] amd/amdkfd: sync all devices to wait all processes being evicted

Zhigang Luo Zhigang.Luo at amd.com
Mon Apr 1 21:53:48 UTC 2024


If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.

Signed-off-by: Zhigang Luo <Zhigang.Luo at amd.com>
Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 041ec3de55e7..55f89c858c7a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -969,11 +969,11 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 	if (!run_pm) {
 		mutex_lock(&kfd_processes_mutex);
 		count = ++kfd_locked;
-		mutex_unlock(&kfd_processes_mutex);
 
 		/* For first KFD device suspend all the KFD processes */
 		if (count == 1)
 			kfd_suspend_all_processes();
+		mutex_unlock(&kfd_processes_mutex);
 	}
 
 	for (i = 0; i < kfd->num_nodes; i++) {
-- 
2.25.1



More information about the amd-gfx mailing list