<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-04-22 9:20 a.m., Felix Kuehling
wrote:<br>
</div>
<blockquote type="cite" cite="mid:5a0dcda1-e270-c109-cfb2-eb882bda0507@amd.com">
<pre class="moz-quote-pre" wrap="">Am 2021-04-22 um 9:08 a.m. schrieb philip yang:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
On 2021-04-20 9:25 p.m., Felix Kuehling wrote:
@@ -2251,14 +2330,34 @@ svm_range_restore_pages(struct amdgpu_device
*adev, unsigned int pasid,
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap=""> }
mmap_read_lock(mm);
+retry_write_locked:
mutex_lock(&svms->lock);
prange = svm_range_from_addr(svms, addr, NULL);
if (!prange) {
pr_debug("failed to find prange svms 0x%p address [0x%llx]\n",
svms, addr);
- r = -EFAULT;
- goto out_unlock_svms;
+ if (!write_locked) {
+ /* Need the write lock to create new range with MMU notifier.
+ * Also flush pending deferred work to make sure the interval
+ * tree is up to date before we add a new range
+ */
+ mutex_unlock(&svms->lock);
+ mmap_read_unlock(mm);
+ svm_range_list_lock_and_flush_work(svms, mm);
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">I think this can deadlock with a deferred worker trying to drain
interrupts (Philip's patch series). If we cannot flush deferred work
here, we need to be more careful creating new ranges to make sure they
don't conflict with added deferred or child ranges.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
It's impossible to have deadlock with deferred worker to drain
interrupts, because drain interrupt wait for restore_pages without
taking any lock, and restore_pages flush deferred work without taking
any lock too.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">The deadlock does not come from holding or waiting for locks. It comes
from the worker waiting for interrupts to drain and the interrupt
handler waiting for the worker to finish with flush_work in
svm_range_list_lock_and_flush_work. If both are waiting for each other,
neither can make progress and you have a deadlock.
</pre>
</blockquote>
<p>yes, you are right, I can repro the deadlock after changing the
kfdtest. We cannot flush deferred work here.</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:5a0dcda1-e270-c109-cfb2-eb882bda0507@amd.com">
<pre class="moz-quote-pre" wrap="">
Regards,
Felix
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Regards,
Philip
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Regards,
Felix
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">+ write_locked = true;
+ goto retry_write_locked;
+ }
+ prange = svm_range_create_unregistered_range(adev, p, mm, addr);
+ if (!prange) {
+ pr_debug("failed to create unregisterd range svms 0x%p address [0x%llx]\n",
+ svms, addr);
+ mmap_write_downgrade(mm);
+ r = -EFAULT;
+ goto out_unlock_svms;
+ }
}
+ if (write_locked)
+ mmap_write_downgrade(mm);
mutex_lock(&prange->migrate_mutex);
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>