[PATCH v2] drm/amdkfd: improve performance with XNACK enable

Thu Jun 12 19:53:05 UTC 2025

When XNACK on, hang or low performance is observed with some test cases.
The restoring page process has unexpected stuck during evicting/restoring
if some bo's flag has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting:
1. when xnack on, retry pagefault will invoke restoring pages process
2. A. if there is enough VRAM space, simply migrating pages from ram to vram
   B. if there is no enough VRAM space left, searching resource LRU list, and
      scheduling a new eviction work queue to evict LRU bo from vram to ram
      first, then resume restoring pages process, or waiting for eviction
      timeout and try to schedule evicting next LRU bo
3. for case 2B, if bo has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting,
   queue eviction will be triggered.So restoring work queue will be scheduled.
4. step 1, restoring pages process will hold one mm->mmap_lock's read until
   restoring pages is completed
   step 2B, evictiion work queue process will hold one mm->mmap_lock's read
   until evicting bo is completed
   step 3, restoring work queue process is trying to acquire one mm->mmap_lock's
   write after the above two mm->mmap_lock's read are released, and in the
   meantime which will block all following mm->mmap_lock's read request.
5. in step 2, if the first eviction bo's size is big enough for step 1
   restoring pages request, everything is fine. if not, which means that the
   mm->mmap_lock's read step 1 won't be release right the way. In step 3, first
   eviction bo's restoring work queue will compete for mm->mmap_lock's write,
   the second and following LRU bo's evictiion work queue will be blocked by
   tring to acquire mm->mmap_lock's read until timeout. All restoring pages
   process will be stuck here.
Using down_write_trylock to replace mmap_write_lock will help not block the
second and following evictiion work queue process.

-v2: just return if failed to get write lock, lets caller decides if retry.

Signed-off-by: James Zhu <James.Zhu at amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  4 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 19 ++++++++++++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h     |  2 +-
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a2149afa5803..ba232cc13e9b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1068,7 +1068,9 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	/* Flush pending deferred work to avoid racing with deferred actions
 	 * from previous memory map changes (e.g. munmap).
 	 */
-	svm_range_list_lock_and_flush_work(&p->svms, current->mm);
+	err = svm_range_list_lock_and_flush_work(&p->svms, current->mm);
+	if (err)
+		return err;
 	mutex_lock(&p->svms.lock);
 	mmap_write_unlock(current->mm);
 	if (interval_tree_iter_first(&p->svms.objects,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 865dca2547de..d4eccc6567ad 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1785,16 +1785,21 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
  * @svms: the svm range list
  * @mm: the mm structure
  *
- * Context: Returns with mmap write lock held, pending deferred work flushed
+ * If gets write lock, returns with mmap write lock held and pending
+ * deferred work flushed, otherwise return -EAGAIN, lets caller
+ * decides if it needs retry to get write lock. Since busy wait for
+ * write lock here properly blocks restore pages process which held
+ * read lock in the beginning and asks for read lock during migration.
  *
  */
-void
+int
 svm_range_list_lock_and_flush_work(struct svm_range_list *svms,
 				   struct mm_struct *mm)
 {
 retry_flush_work:
 	flush_work(&svms->deferred_list_work);
-	mmap_write_lock(mm);
+	if (!down_write_trylock(&(mm->mmap_lock)))
+		return -EAGAIN;
 
 	if (list_empty(&svms->deferred_range_list))
 		return;
@@ -1833,7 +1838,8 @@ static void svm_range_restore_work(struct work_struct *work)
 	}
 
 	mutex_lock(&process_info->lock);
-	svm_range_list_lock_and_flush_work(svms, mm);
+	if (svm_range_list_lock_and_flush_work(svms, mm))
+		goto out_reschedule1;
 	mutex_lock(&svms->lock);
 
 	evicted_ranges = atomic_read(&svms->evicted_ranges);
@@ -1885,6 +1891,7 @@ static void svm_range_restore_work(struct work_struct *work)
 out_reschedule:
 	mutex_unlock(&svms->lock);
 	mmap_write_unlock(mm);
+out_reschedule1:
 	mutex_unlock(&process_info->lock);
 
 	/* If validation failed, reschedule another attempt */
@@ -3638,7 +3645,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
 
 	mutex_lock(&process_info->lock);
 
-	svm_range_list_lock_and_flush_work(svms, mm);
+	r = svm_range_list_lock_and_flush_work(svms, mm);
+	if (r)
+		goto out;
 
 	r = svm_range_is_valid(p, start, size);
 	if (r) {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 01c7a4877904..c8c9bc7eead9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -196,7 +196,7 @@ int kfd_criu_restore_svm(struct kfd_process *p,
 int kfd_criu_resume_svm(struct kfd_process *p);
 struct kfd_process_device *
 svm_range_get_pdd_by_node(struct svm_range *prange, struct kfd_node *node);
-void svm_range_list_lock_and_flush_work(struct svm_range_list *svms, struct mm_struct *mm);
+int svm_range_list_lock_and_flush_work(struct svm_range_list *svms, struct mm_struct *mm);
 
 /* SVM API and HMM page migration work together, device memory type
  * is initialized to not 0 when page migration register device memory.
-- 
2.34.1