<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div class="moz-cite-prefix">On 2023-10-30 4:23, Christian König
wrote:<br>
</div>
<blockquote type="cite" cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com">Am
28.10.23 um 00:39 schrieb Felix Kuehling:
<br>
<blockquote type="cite">Make restore workers freezable so we don't
have to explicitly flush them
<br>
in suspend and GPU reset code paths, and we don't accidentally
try to
<br>
restore BOs while the GPU is suspended. Not having to flush
restore_work
<br>
also helps avoid lock/fence dependencies in the GPU reset case
where we're
<br>
not allowed to wait for fences.
<br>
<br>
This is an RFC and request for testing. I have not tested this
myself yet.
<br>
My notes below:
<br>
<br>
Restore work won't be frozen during GPU reset. Does it matter?
Queues will
<br>
stay evicted until resume in any case. But restore_work may be
in trouble
<br>
if it gets run in the middle of a GPU reset. In that case, if
anything
<br>
fails, it will just reschedule itself, so should be fine as long
as it
<br>
doesn't interfere with the reset itself (respects any mechanisms
in place
<br>
to prevent HW access during the reset).
<br>
<br>
What HW access does restore_work perform:
<br>
- migrating buffers: uses GPU scheduler, should be suspended
during reset
<br>
- TLB flushes in userptr restore worker: not called directly,
relies on
<br>
HWS to flush TLBs on VMID attachment
<br>
- TLB flushes in SVM restore worker: Does TLB flush in the
mapping code
<br>
- Resuming user mode queues: should not happen while GPU reset
keeps queue
<br>
eviction counter elevated
<br>
Ergo: Except for the SVM case, it's OK to not flush restore work
before
<br>
GPU resets. I'll need to rethink the interaction of SVM
restore_work with
<br>
GPU resets.
<br>
</blockquote>
<br>
It also sounds like the restore work is some high level work and
shouldn't interact with the lower level GPU reset.
<br>
</blockquote>
<p>That was my hope as well. Just trying to think through to make
sure I'm not making any incorrect assumptions.</p>
<p>Do you know if there is anything preventing a TLB flush using
MMIO from messing up a GPU reset in progress? That's the only
thing in the SVM restore work that tries to touch HW directly.<br>
</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com">
<br>
<blockquote type="cite">
<br>
What about cancelling p->eviction_work? Eviction work would
normally be
<br>
needed to signal eviction fences, but we're doing that
explicitly in
<br>
suspend_all_processes. Does eviction_work wait for fences
anywhere? Yes,
<br>
indirectly by flushing restore_work. So we should not try to
cancel it
<br>
during a GPU reset.
<br>
<br>
Problem: accessing p->ef concurrently in evict_process_worker
and
<br>
suspend_all_processes. Need a spinlock to use and update it
safely.
<br>
Problem: What if evict_process_worker gets stuck in flushing
restore_work?
<br>
We can skip all of that if p->ef is NULL (which it is during
the reset).
<br>
Even if it gets stuck, it's no problem if the reset doesn't
depend on it.
<br>
It should get unstuck after the reset.
<br>
<br>
Signed-off-by: Felix Kuehling <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>
<br>
---
<br>
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++--
<br>
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
<br>
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 49
+++++++++++++------
<br>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +-
<br>
4 files changed, 44 insertions(+), 19 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
<br>
index 54f31a420229..89e632257663 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
<br>
@@ -1644,7 +1644,8 @@ int amdgpu_amdkfd_criu_resume(void *p)
<br>
goto out_unlock;
<br>
}
<br>
WRITE_ONCE(pinfo->block_mmu_notifications, false);
<br>
- schedule_delayed_work(&pinfo->restore_userptr_work,
0);
<br>
+ queue_delayed_work(system_freezable_wq,
<br>
+ &pinfo->restore_userptr_work, 0);
<br>
out_unlock:
<br>
mutex_unlock(&pinfo->lock);
<br>
@@ -2458,7 +2459,8 @@ int amdgpu_amdkfd_evict_userptr(struct
mmu_interval_notifier *mni,
<br>
KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
<br>
if (r)
<br>
pr_err("Failed to quiesce KFD\n");
<br>
-
schedule_delayed_work(&process_info->restore_userptr_work,
<br>
+ queue_delayed_work(system_freezable_wq,
<br>
+ &process_info->restore_userptr_work,
<br>
msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
<br>
}
<br>
mutex_unlock(&process_info->notifier_lock);
<br>
@@ -2793,7 +2795,8 @@ static void
amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
<br>
/* If validation failed, reschedule another attempt */
<br>
if (evicted_bos) {
<br>
-
schedule_delayed_work(&process_info->restore_userptr_work,
<br>
+ queue_delayed_work(system_freezable_wq,
<br>
+ &process_info->restore_userptr_work,
<br>
msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
<br>
kfd_smi_event_queue_restore_rescheduled(mm);
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
<br>
index 9cc32f577e38..cf017d027fee 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
<br>
@@ -919,6 +919,7 @@ struct kfd_process {
<br>
* during restore
<br>
*/
<br>
struct dma_fence *ef;
<br>
+ spinlock_t ef_lock;
<br>
/* Work items for evicting and restoring BOs */
<br>
struct delayed_work eviction_work;
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
<br>
index fbf053001af9..a07cba58ec5e 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
<br>
@@ -664,7 +664,8 @@ int kfd_process_create_wq(void)
<br>
if (!kfd_process_wq)
<br>
kfd_process_wq = alloc_workqueue("kfd_process_wq", 0,
0);
<br>
if (!kfd_restore_wq)
<br>
- kfd_restore_wq =
alloc_ordered_workqueue("kfd_restore_wq", 0);
<br>
+ kfd_restore_wq =
alloc_ordered_workqueue("kfd_restore_wq",
<br>
+ WQ_FREEZABLE);
<br>
if (!kfd_process_wq || !kfd_restore_wq) {
<br>
kfd_process_destroy_wq();
<br>
@@ -1460,6 +1461,7 @@ static struct kfd_process
*create_process(const struct task_struct *thread)
<br>
kref_init(&process->ref);
<br>
mutex_init(&process->mutex);
<br>
+ spin_lock_init(&process->ef_lock);
<br>
process->mm = thread->mm;
<br>
process->lead_thread = thread->group_leader;
<br>
process->n_pdds = 0;
<br>
@@ -1904,6 +1906,19 @@ kfd_process_gpuid_from_node(struct
kfd_process *p, struct kfd_node *node,
<br>
return -EINVAL;
<br>
}
<br>
+static void signal_eviction_fence(struct kfd_process *p)
<br>
+{
<br>
+ spin_lock(&p->ef_lock);
<br>
+ if (!p->ef)
<br>
+ goto unlock_out;
<br>
+ dma_fence_signal(p->ef);
<br>
</blockquote>
<br>
This needs to grab the internal lock of the eviction fence, I'm
not sure that has correct ordering with the newly added ef_lock.
<br>
</blockquote>
<p>Hmm, the only thing we could get a circular lock dependency would
be, if we took the p->ef_lock in a fence callback. I don't
think that ever happens, because even the eviction work runs on a
worker thread (exactly to avoid such lock dependency issues).</p>
<p>Anyway, I could try to move the fence_signal out of the critical
section. The lock is only there to ensure that exactly one process
signals and frees the fence.</p>
<pre> spin_lock(&p->ef_lock);
ef = p->ef;
WRITE_ONCE(p->ef, NULL);
spin_unlock(&p->ef_lock);
if (ef) {
dma_fence_signal(ef);
dma_fence_put(ef);
}
</pre>
<p><br>
</p>
<blockquote type="cite" cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com">
<br>
<blockquote type="cite">+ dma_fence_put(p->ef);
<br>
+ WRITE_ONCE(p->ef, NULL);
<br>
+
<br>
+unlock_out:
<br>
+ spin_unlock(&p->ef_lock);
<br>
+}
<br>
+
<br>
static void evict_process_worker(struct work_struct *work)
<br>
{
<br>
int ret;
<br>
@@ -1916,8 +1931,11 @@ static void evict_process_worker(struct
work_struct *work)
<br>
* lifetime of this thread, kfd_process p will be valid
<br>
*/
<br>
p = container_of(dwork, struct kfd_process,
eviction_work);
<br>
- WARN_ONCE(p->last_eviction_seqno != p->ef->seqno,
<br>
- "Eviction fence mismatch\n");
<br>
+ /* If the eviction fence is not valid, we're probably in a
suspend
<br>
+ * or GPU reset cycle. There is nothing to do in this case.
<br>
+ */
<br>
+ if (!READ_ONCE(p->ef))
<br>
+ return;
<br>
</blockquote>
<br>
This evict process work is high level I don't think it should have
any dependency on the GPU reset.
<br>
</blockquote>
<p>Right. This is not here to avoid issues, just a short-cut to
avoid unnecessary work.</p>
<p>Regards,<br>
Felix<br>
</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com">
<br>
Regards,
<br>
Christian.
<br>
<br>
<blockquote type="cite"> /* Narrow window of overlap
between restore and evict work
<br>
* item is possible. Once
amdgpu_amdkfd_gpuvm_restore_process_bos
<br>
@@ -1930,9 +1948,7 @@ static void evict_process_worker(struct
work_struct *work)
<br>
pr_debug("Started evicting pasid 0x%x\n", p->pasid);
<br>
ret = kfd_process_evict_queues(p,
KFD_QUEUE_EVICTION_TRIGGER_TTM);
<br>
if (!ret) {
<br>
- dma_fence_signal(p->ef);
<br>
- dma_fence_put(p->ef);
<br>
- p->ef = NULL;
<br>
+ signal_eviction_fence(p);
<br>
queue_delayed_work(kfd_restore_wq,
&p->restore_work,
<br>
msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
<br>
@@ -1967,9 +1983,19 @@ static void
restore_process_worker(struct work_struct *work)
<br>
p->last_restore_timestamp = get_jiffies_64();
<br>
/* VMs may not have been acquired yet during debugging. */
<br>
- if (p->kgd_process_info)
<br>
+ if (p->kgd_process_info) {
<br>
+ struct dma_fence *ef = NULL;
<br>
+
<br>
ret =
amdgpu_amdkfd_gpuvm_restore_process_bos(p->kgd_process_info,
<br>
- &p->ef);
<br>
+ &ef);
<br>
+ if (!ret) {
<br>
+ spin_lock(&p->ef_lock);
<br>
+ WARN_ONCE(p->ef, "Eviction fence is not NULL");
<br>
+ WRITE_ONCE(p->ef, ef);
<br>
+ spin_unlock(&p->ef_lock);
<br>
+ }
<br>
+ }
<br>
+
<br>
if (ret) {
<br>
pr_debug("Failed to restore BOs of pasid 0x%x, retry
after %d ms\n",
<br>
p->pasid, PROCESS_BACK_OFF_TIME_MS);
<br>
@@ -1994,14 +2020,9 @@ void kfd_suspend_all_processes(void)
<br>
WARN(debug_evictions, "Evicting all processes");
<br>
hash_for_each_rcu(kfd_processes_table, temp, p,
kfd_processes) {
<br>
- cancel_delayed_work_sync(&p->eviction_work);
<br>
- flush_delayed_work(&p->restore_work);
<br>
-
<br>
if (kfd_process_evict_queues(p,
KFD_QUEUE_EVICTION_TRIGGER_SUSPEND))
<br>
pr_err("Failed to suspend process 0x%x\n",
p->pasid);
<br>
- dma_fence_signal(p->ef);
<br>
- dma_fence_put(p->ef);
<br>
- p->ef = NULL;
<br>
+ signal_eviction_fence(p);
<br>
}
<br>
srcu_read_unlock(&kfd_processes_srcu, idx);
<br>
}
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
index fe937670c927..aa6c34127be9 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
@@ -1869,7 +1869,7 @@ static void svm_range_restore_work(struct
work_struct *work)
<br>
/* If validation failed, reschedule another attempt */
<br>
if (evicted_ranges) {
<br>
pr_debug("reschedule to restore svm range\n");
<br>
- schedule_delayed_work(&svms->restore_work,
<br>
+ queue_delayed_work(system_freezable_wq,
&svms->restore_work,
<br>
msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
<br>
kfd_smi_event_queue_restore_rescheduled(mm);
<br>
@@ -1945,7 +1945,7 @@ svm_range_evict(struct svm_range *prange,
struct mm_struct *mm,
<br>
pr_debug("failed to quiesce KFD\n");
<br>
pr_debug("schedule to restore svm %p ranges\n",
svms);
<br>
- schedule_delayed_work(&svms->restore_work,
<br>
+ queue_delayed_work(system_freezable_wq,
&svms->restore_work,
<br>
msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
<br>
} else {
<br>
unsigned long s, l;
<br>
</blockquote>
<br>
</blockquote>
</body>
</html>