[PATCH] drm/amdkfd: Skip locking KFD when unbinding GPU
Lawrence Yiu
lawyiu.dev at gmail.com
Mon Nov 6 07:14:05 UTC 2023
After unbinding a GPU, KFD becomes locked and unusable, resulting in
applications not being able to use ROCm for compute anymore and rocminfo
outputting the following error message:
ROCk module is loaded
Unable to open /dev/kfd read-write: Invalid argument
KFD remains locked even after rebinding the same GPU and a system reboot
is required to unlock it. Fix this by not locking KFD during the GPU
unbind process.
Closes: https://github.com/RadeonOpenCompute/ROCm/issues/629
Signed-off-by: Lawrence Yiu <lawyiu.dev at gmail.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..c9436039e619 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -949,8 +949,8 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm)
if (!kfd->init_complete)
return;
- /* for runtime suspend, skip locking kfd */
- if (!run_pm) {
+ /* for runtime suspend or GPU unbind, skip locking kfd */
+ if (!run_pm && !drm_dev_is_unplugged(adev_to_drm(kfd->adev))) {
mutex_lock(&kfd_processes_mutex);
count = ++kfd_locked;
mutex_unlock(&kfd_processes_mutex);
--
2.34.1
More information about the amd-gfx
mailing list