[PATCH] drm/amdkfd: avoid recursive lock in migrations back to RAM
Alex Sierra
alex.sierra at amd.com
Mon Nov 1 22:04:35 UTC 2021
[Why]:
During hmm_range_fault validation calls on VRAM migrations, there could
be cases where some pages within the range could be marked as Read Only
(COW) triggering a migration back to RAM. In this case, the migration to
RAM will try to grab mutexes that have been held already before the
hmm_range_fault call. Causing a recursive lock.
[How]:
Creating a task reference new member under prange struct. And setting
this with "current" reference, right before the hmm_range_fault is
called. This member is checked against "current" reference at
svm_migrate_to_ram callback function. If equal, the migration will be
ignored.
Signed-off-by: Alex Sierra <alex.sierra at amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 4 ++++
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++
3 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index bff40e8bca67..eb19f44ec86d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -936,6 +936,10 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
pr_debug("failed find process at fault address 0x%lx\n", addr);
return VM_FAULT_SIGBUS;
}
+ if (READ_ONCE(p->svms.faulting_task) == current) {
+ pr_debug("skipping ram migration\n");
+ return 0;
+ }
addr >>= PAGE_SHIFT;
pr_debug("CPU page fault svms 0x%p address 0x%lx\n", &p->svms, addr);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f88666bdf57c..7b41a58b1ade 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -858,6 +858,7 @@ struct svm_range_list {
atomic_t evicted_ranges;
struct delayed_work restore_work;
DECLARE_BITMAP(bitmap_supported, MAX_GPU_INSTANCE);
+ struct task_struct *faulting_task;
};
/* Process data */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 939c863315ba..e9eeee2e571c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1492,9 +1492,11 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
next = min(vma->vm_end, end);
npages = (next - addr) >> PAGE_SHIFT;
+ WRITE_ONCE(p->svms.faulting_task, current);
r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
addr, npages, &hmm_range,
readonly, true, owner);
+ WRITE_ONCE(p->svms.faulting_task, NULL);
if (r) {
pr_debug("failed %d to get svm range pages\n", r);
goto unreserve_out;
@@ -2745,6 +2747,7 @@ int svm_range_list_init(struct kfd_process *p)
INIT_WORK(&svms->deferred_list_work, svm_range_deferred_list_work);
INIT_LIST_HEAD(&svms->deferred_range_list);
spin_lock_init(&svms->deferred_list_lock);
+ svms->faulting_task = NULL;
for (i = 0; i < p->n_pdds; i++)
if (KFD_IS_SVM_API_SUPPORTED(p->pdds[i]->dev))
--
2.32.0
More information about the amd-gfx
mailing list