<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 7/16/2025 12:57 PM, Gang Ba wrote:<br>
</div>
<blockquote type="cite" cite="mid:20250716175753.703955-1-Gang.Ba@amd.com">
<pre wrap="" class="moz-quote-pre">If vm belongs to another process, this is fclose after fork,
wait may enable signaling KFD eviction fence and cause parent process queue evicted.</pre>
</blockquote>
<p>The commit message does not target the issue description. <span style="white-space: pre-wrap">amdgpu_flush</span> got trigger from
child process when it makes execve system call because render node
has O_CLOEXEC flag. fork only does not close inherited file
descriptors from child process. The back trace below also shows
that.</p>
<p>Regards</p>
<p>Xiaogang<br>
</p>
<blockquote type="cite" cite="mid:20250716175753.703955-1-Gang.Ba@amd.com">
<pre wrap="" class="moz-quote-pre">
[677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu]
[677852.634814] __dma_fence_enable_signaling+0x3e/0xe0
[677852.634820] dma_fence_wait_timeout+0x3a/0x140
[677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl]
[677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu]
[677852.635026] amdgpu_flush+0x34/0x50 [amdgpu]
[677852.635208] filp_flush+0x38/0x90
[677852.635213] filp_close+0x14/0x30
[677852.635216] do_close_on_exec+0xdd/0x130
[677852.635221] begin_new_exec+0x1da/0x490
[677852.635225] load_elf_binary+0x307/0xea0
[677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5
[677852.635235] ? ima_bprm_check+0xa2/0xd0
[677852.635240] search_binary_handler+0xda/0x260
[677852.635245] exec_binprm+0x58/0x1a0
[677852.635249] bprm_execve.part.0+0x16f/0x210
[677852.635254] bprm_execve+0x45/0x80
[677852.635257] do_execveat_common.isra.0+0x190/0x200
Suggested-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
Signed-off-by: Gang Ba <a class="moz-txt-link-rfc2396E" href="mailto:Gang.Ba@amd.com"><Gang.Ba@amd.com></a>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ea9b0f050f79..2f75f967f95f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2414,13 +2414,13 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint32_t min_vm_size,
*/
long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout)
{
- timeout = dma_resv_wait_timeout(vm->root.bo->tbo.base.resv,
- DMA_RESV_USAGE_BOOKKEEP,
- true, timeout);
+ guard(mutex)(&vm->eviction_lock);
+
+ timeout = drm_sched_entity_flush(&vm->immediate, timeout);
if (timeout <= 0)
return timeout;
- return dma_fence_wait_timeout(vm->last_unlocked, true, timeout);
+ return drm_sched_entity_flush(&vm->delayed, timeout);
}
static void amdgpu_vm_destroy_task_info(struct kref *kref)
</pre>
</blockquote>
</body>
</html>