[PATCH] drm/amdkfd: Skip locking KFD when unbinding GPU

Lawrence Yiu lawyiu.dev at gmail.com
Mon Nov 6 07:14:05 UTC 2023


After unbinding a GPU, KFD becomes locked and unusable, resulting in
applications not being able to use ROCm for compute anymore and rocminfo
outputting the following error message:

ROCk module is loaded
Unable to open /dev/kfd read-write: Invalid argument

KFD remains locked even after rebinding the same GPU and a system reboot
is required to unlock it. Fix this by not locking KFD during the GPU
unbind process.

Closes: https://github.com/RadeonOpenCompute/ROCm/issues/629
Signed-off-by: Lawrence Yiu <lawyiu.dev at gmail.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..c9436039e619 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -949,8 +949,8 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
 	if (!kfd->init_complete)
 		return;
 
-	/* for runtime suspend, skip locking kfd */
-	if (!run_pm) {
+	/* for runtime suspend or GPU unbind, skip locking kfd */
+	if (!run_pm && !drm_dev_is_unplugged(adev_to_drm(kfd->adev))) {
 		mutex_lock(&kfd_processes_mutex);
 		count = ++kfd_locked;
 		mutex_unlock(&kfd_processes_mutex);
-- 
2.34.1



More information about the amd-gfx mailing list