[PATCH] drm/amdkfd: kfd open return failed if device is locked
Kuehling, Felix
Felix.Kuehling at amd.com
Fri Oct 18 15:40:45 UTC 2019
On 2019-10-18 10:27 a.m., Yang, Philip wrote:
> If device is locked for suspend and resume, kfd open should return
> failed -EAGAIN without creating process, otherwise the application exit
> to release the process will hang to wait for resume is done if the suspend
> and resume is stuck somewhere. This is backtrace:
This doesn't fix processes that were created before suspend/resume got
stuck. They would still get stuck with the same backtrace. So this is
jut a band-aid. The real underlying problem, that is not getting
addressed, is suspend/resume getting stuck.
Am I missing something?
Regards,
Felix
>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019] Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfo D 0 3024 2947
> 0x80000000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019] ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019] schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019] schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019] __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019] ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019] ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019] process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019] kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019] __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019] exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019] ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019] ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019] mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019] do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019] ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019] do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019] __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019] do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file *filep)
> return -EPERM;
> }
>
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
> process = kfd_create_process(filep);
> if (IS_ERR(process))
> return PTR_ERR(process);
>
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
> dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
> process->pasid, process->is_32bit_user_mode);
>
More information about the amd-gfx
mailing list