<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 7/16/2025 12:57 PM, Gang Ba wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:20250716175753.703955-1-Gang.Ba@amd.com">
      <pre wrap="" class="moz-quote-pre">If vm belongs to another process, this is fclose after fork,
wait may enable signaling KFD eviction fence and cause parent process queue evicted.</pre>
    </blockquote>
    <p>The commit message does not target the issue description. <span style="white-space: pre-wrap">amdgpu_flush</span> got trigger from
      child process when it makes execve system call because render node
      has O_CLOEXEC flag. fork only does not close inherited file
      descriptors from child process. The back trace below also shows
      that.</p>
    <p>Regards</p>
    <p>Xiaogang<br>
    </p>
    <blockquote type="cite" cite="mid:20250716175753.703955-1-Gang.Ba@amd.com">
      <pre wrap="" class="moz-quote-pre">

[677852.634569]  amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu]
[677852.634814]  __dma_fence_enable_signaling+0x3e/0xe0
[677852.634820]  dma_fence_wait_timeout+0x3a/0x140
[677852.634825]  amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl]
[677852.634831]  amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu]
[677852.635026]  amdgpu_flush+0x34/0x50 [amdgpu]
[677852.635208]  filp_flush+0x38/0x90
[677852.635213]  filp_close+0x14/0x30
[677852.635216]  do_close_on_exec+0xdd/0x130
[677852.635221]  begin_new_exec+0x1da/0x490
[677852.635225]  load_elf_binary+0x307/0xea0
[677852.635231]  ? srso_alias_return_thunk+0x5/0xfbef5
[677852.635235]  ? ima_bprm_check+0xa2/0xd0
[677852.635240]  search_binary_handler+0xda/0x260
[677852.635245]  exec_binprm+0x58/0x1a0
[677852.635249]  bprm_execve.part.0+0x16f/0x210
[677852.635254]  bprm_execve+0x45/0x80
[677852.635257]  do_execveat_common.isra.0+0x190/0x200

Suggested-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
Signed-off-by: Gang Ba <a class="moz-txt-link-rfc2396E" href="mailto:Gang.Ba@amd.com"><Gang.Ba@amd.com></a>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ea9b0f050f79..2f75f967f95f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2414,13 +2414,13 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint32_t min_vm_size,
  */
 long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout)
 {
-       timeout = dma_resv_wait_timeout(vm->root.bo->tbo.base.resv,
-                                       DMA_RESV_USAGE_BOOKKEEP,
-                                       true, timeout);
+       guard(mutex)(&vm->eviction_lock);
+
+       timeout = drm_sched_entity_flush(&vm->immediate, timeout);
        if (timeout <= 0)
                return timeout;
 
-       return dma_fence_wait_timeout(vm->last_unlocked, true, timeout);
+       return drm_sched_entity_flush(&vm->delayed, timeout);
 }
 
 static void amdgpu_vm_destroy_task_info(struct kref *kref)
</pre>
    </blockquote>
  </body>
</html>