System error
1577332900
1577332900 at qq.com
Tue Sep 3 12:31:55 UTC 2019
Hi ALL,
Some processes transfer to D status. This stack is:
#0 [ffff0001343e3a40] __switch_to at ffff000008088870
/usr/src/linux-4.19.36-1.2.159.aarch64/arch/arm64/kernel/process.c: 491
#1 [ffff0001343e3a60] __schedule at ffff000008bf8508
/usr/src/linux-4.19.36-1.2.159.aarch64/kernel/sched/core.c: 2851
#2 [ffff0001343e3af0] schedule at ffff000008bf8be8
/usr/src/linux-4.19.36-1.2.159.aarch64/kernel/sched/core.c: 3543
#3 [ffff0001343e3b00] drm_sched_entity_flush at ffff000000ce6054 [gpu_sched]
/usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/scheduler/sched_entity.c: 187 ----3
#4 [ffff0001343e3b70] drm_sched_entity_destroy at ffff000000ce6430 [gpu_sched]
/usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/scheduler/sched_entity.c: 317
#5 [ffff0001343e3b90] amdgpu_vm_fini at ffff0000019f8054 [amdgpu]
/usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c: 2883
#6 [ffff0001343e3c20] amdgpu_driver_postclose_kms at ffff0000019c856c [amdgpu]
/usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c: 993
#7 [ffff0001343e3c90] drm_file_free at ffff000000fe44dc [drm]
/usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/drm_file.c: 254
#8 [ffff0001343e3cf0] drm_release at ffff000000fe4bd4 [drm]
/usr/src/linux-4.19.36-1.2.159.aarch64/drivers/gpu/drm/drm_file.c: 215
#9 [ffff0001343e3d40] __fput at ffff000008338368 ----2
/usr/src/linux-4.19.36-1.2.159.aarch64/fs/file_table.c: 278
#10 [ffff0001343e3d90] delayed_fput at ffff000008338514
/usr/src/linux-4.19.36-1.2.159.aarch64/fs/file_table.c: 304
#11 [ffff0001343e3db0] process_one_work at ffff00000810e7e0
/usr/src/linux-4.19.36-1.2.159.aarch64/kernel/workqueue.c: 2153
#12 [ffff0001343e3e00] worker_thread at ffff00000810ec60 ----1
/usr/src/linux-4.19.36-1.2.159.aarch64/kernel/workqueue.c: 2212
#13 [ffff0001343e3e70] kthread at ffff000008115e60
Kernel delay task begin to call drm_release when the drm file is not be used.
But in 3) point,the function do not run to wait_event_timeout().
This codes is:
long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
{
struct drm_gpu_scheduler *sched;
struct task_struct *last_user;
long ret = timeout;
if (!entity->rq)
return 0;
sched = entity->rq->sched;
/**
* The client will not queue more IBs during this fini, consume existing
* queued IBs or discard them on SIGKILL
*/
if (current->flags & PF_EXITING) { ------ When the process is kernel task such as delay task, then it will not run to this codes. But applicational process is exited .
if (timeout)
ret = wait_event_timeout(
sched->job_scheduled,
drm_sched_entity_is_idle(entity),
timeout);
} else {
wait_event_killable(sched->job_scheduled,
drm_sched_entity_is_idle(entity));
}
So when When the current process is asynchronous kernel task such as delay task, then it will not run to this codes. But application process is exited.
Why drm_sched_entity_flush function do not check the case (asynchronous kernel thread call drm_sched_entity_flush function, but app is already exited.)
Can I add check asynchronous kernel thread codes, then call wait_event_timeout and drm_sched_rq_remove_entity functions?
Thanks.
Remarks:
The same process stack:
# cat /proc/336121/stack
[<0>] __switch_to+0x94/0xe8bt
[<0>] drm_sched_entity_flush+0xf8/0x248 [gpu_sched]
[<0>] amdgpu_ctx_mgr_entity_flush+0xac/0x148 [amdgpu]
[<0>] amdgpu_flush+0x2c/0x50 [amdgpu]
[<0>] filp_close+0x40/0xa0
[<0>] put_files_struct+0x118/0x120
[<0>] put_files_struct+0x30/0x68 [binder_linux]
[<0>] binder_deferred_func+0x4d4/0x658 [binder_linux]
[<0>] process_one_work+0x1b4/0x3f8
[<0>] worker_thread+0x54/0x470
[<0>] kthread+0x134/0x138b
[<0>] ret_from_fork+0x10/0x18
[<0>] 0xffffffffffffffff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190903/d2a5370c/attachment-0001.html>
More information about the amd-gfx
mailing list