<div><div>Hi ALL,</div><div>       Some processes transfer to D status. This stack is:</div><div> </div><div>#0 [ffff0001343e3a40] __switch_to at ffff000008088870</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/arch/arm64/kernel/process.c: 491</div><div>#1 [ffff0001343e3a60] __schedule at ffff000008bf8508</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/kernel/sched/core.c: 2851</div><div>#2 [ffff0001343e3af0] schedule at ffff000008bf8be8</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/kernel/sched/core.c: 3543</div><div>#3 [ffff0001343e3b00] drm_sched_entity_flush at ffff000000ce6054 [gpu_sched]</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/scheduler/sched_entity.c: 187     ----3 </div><div> #4 [ffff0001343e3b70] drm_sched_entity_destroy at ffff000000ce6430 [gpu_sched]</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/scheduler/sched_entity.c: 317</div><div>#5 [ffff0001343e3b90] amdgpu_vm_fini at ffff0000019f8054 [amdgpu]</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c: 2883</div><div>#6 [ffff0001343e3c20] amdgpu_driver_postclose_kms at ffff0000019c856c [amdgpu]</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c: 993</div><div>#7 [ffff0001343e3c90] drm_file_free at ffff000000fe44dc [drm]</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/drm_file.c: 254</div><div>#8 [ffff0001343e3cf0] drm_release at ffff000000fe4bd4 [drm]</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/drm_file.c: 215</div><div>#9 [ffff0001343e3d40] __fput at ffff000008338368    ----2</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/fs/file_table.c: 278</div><div>#10 [ffff0001343e3d90] delayed_fput at ffff000008338514</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/fs/file_table.c: 304</div><div>#11 [ffff0001343e3db0] process_one_work at ffff00000810e7e0</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/kernel/workqueue.c: 2153</div><div>#12 [ffff0001343e3e00] worker_thread at ffff00000810ec60  ----1</div><div>    /usr/src/linux-4.19.36-1.2.159.aarch64/kernel/workqueue.c: 2212</div><div>#13 [ffff0001343e3e70] kthread at ffff000008115e60</div><div><br></div><div>Kernel delay task begin to call drm_release when the drm file is not be used.</div><div>But in 3) point,the function do not run to wait_event_timeout().</div><div>This codes is:</div><div>long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)</div><div>{</div><div>                struct drm_gpu_scheduler *sched;</div><div>                struct task_struct *last_user;</div><div>                long ret = timeout;</div><div><br></div><div>                if (!entity->rq)</div><div>                                return 0;</div><div><br></div><div>                sched = entity->rq->sched;</div><div>                /**</div><div>                * The client will not queue more IBs during this fini, consume existing</div><div>                * queued IBs or discard them on SIGKILL</div><div>                */</div><div>                if (current->flags & PF_EXITING) {      ------ When the process is kernel task such as delay task, then it will not run to this codes. But applicational process is exited .</div><div>                                if (timeout)</div><div>                                                ret = wait_event_timeout(</div><div>                                                                                sched->job_scheduled,</div><div>                                                                                drm_sched_entity_is_idle(entity),</div><div>                                                                                timeout);</div><div>                } else {</div><div>                                wait_event_killable(sched->job_scheduled,</div><div>                                                                    drm_sched_entity_is_idle(entity));</div><div>                }</div><div><br></div><div>So when When the current process is asynchronous kernel task such as delay task, then it will not run to this codes. But application process is exited.</div><div><br></div><div>Why drm_sched_entity_flush function do not  check the case (asynchronous kernel thread call drm_sched_entity_flush function, but app is already exited.)</div><div>Can I add check asynchronous kernel thread codes, then call  wait_event_timeout and drm_sched_rq_remove_entity functions?</div><div>Thanks.</div><div><br></div><div>Remarks:</div><div>The same process stack:</div><div># cat /proc/336121/stack</div><div>[<0>] __switch_to+0x94/0xe8bt</div><div>[<0>] drm_sched_entity_flush+0xf8/0x248 [gpu_sched]</div><div>[<0>] amdgpu_ctx_mgr_entity_flush+0xac/0x148 [amdgpu]</div><div>[<0>] amdgpu_flush+0x2c/0x50 [amdgpu]</div><div>[<0>] filp_close+0x40/0xa0</div><div>[<0>] put_files_struct+0x118/0x120</div><div>[<0>] put_files_struct+0x30/0x68 [binder_linux]</div><div>[<0>] binder_deferred_func+0x4d4/0x658 [binder_linux]</div><div>[<0>] process_one_work+0x1b4/0x3f8</div><div>[<0>] worker_thread+0x54/0x470</div><div>[<0>] kthread+0x134/0x138b</div><div>[<0>] ret_from_fork+0x10/0x18</div><div>[<0>] 0xffffffffffffffff</div></div><div><!--emptysign--></div>