[PATCH] drm/amdkfd: kfd open return failed if device is locked
Yang, Philip
Philip.Yang at amd.com
Fri Oct 18 17:34:47 UTC 2019
On 2019-10-18 11:40 a.m., Kuehling, Felix wrote:
> On 2019-10-18 10:27 a.m., Yang, Philip wrote:
>> If device is locked for suspend and resume, kfd open should return
>> failed -EAGAIN without creating process, otherwise the application exit
>> to release the process will hang to wait for resume is done if the suspend
>> and resume is stuck somewhere. This is backtrace:
>
> This doesn't fix processes that were created before suspend/resume got
> stuck. They would still get stuck with the same backtrace. So this is
> jut a band-aid. The real underlying problem, that is not getting
> addressed, is suspend/resume getting stuck.
>
> Am I missing something?
>
This is to address application stuck to quit issue after suspend/resume
got stuck. The real underlying suspend/resume issue should be addressed
separately.
I will submit v2 patch to fix processes that were created before
suspend/resume got stuck.
Philip
> Regards,
> Felix
>
>
>>
>> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
>> than 120 seconds.
>> [Thu Oct 17 16:43:37 2019] Not tainted
>> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
>> [Thu Oct 17 16:43:37 2019] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Thu Oct 17 16:43:37 2019] rocminfo D 0 3024 2947
>> 0x80000000
>> [Thu Oct 17 16:43:37 2019] Call Trace:
>> [Thu Oct 17 16:43:37 2019] ? __schedule+0x3d9/0x8a0
>> [Thu Oct 17 16:43:37 2019] schedule+0x32/0x70
>> [Thu Oct 17 16:43:37 2019] schedule_preempt_disabled+0xa/0x10
>> [Thu Oct 17 16:43:37 2019] __mutex_lock.isra.9+0x1e3/0x4e0
>> [Thu Oct 17 16:43:37 2019] ? __call_srcu+0x264/0x3b0
>> [Thu Oct 17 16:43:37 2019] ? process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019] process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]
>> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
>> [Thu Oct 17 16:43:37 2019] kfd_process_notifier_release+0x1be/0x220
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019] __mmu_notifier_release+0x3e/0xc0
>> [Thu Oct 17 16:43:37 2019] exit_mmap+0x160/0x1a0
>> [Thu Oct 17 16:43:37 2019] ? __handle_mm_fault+0xba3/0x1200
>> [Thu Oct 17 16:43:37 2019] ? exit_robust_list+0x5a/0x110
>> [Thu Oct 17 16:43:37 2019] mmput+0x4a/0x120
>> [Thu Oct 17 16:43:37 2019] do_exit+0x284/0xb20
>> [Thu Oct 17 16:43:37 2019] ? handle_mm_fault+0xfa/0x200
>> [Thu Oct 17 16:43:37 2019] do_group_exit+0x3a/0xa0
>> [Thu Oct 17 16:43:37 2019] __x64_sys_exit_group+0x14/0x20
>> [Thu Oct 17 16:43:37 2019] do_syscall_64+0x4f/0x100
>> [Thu Oct 17 16:43:37 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index d9e36dbf13d5..40d75c39f08e 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file *filep)
>> return -EPERM;
>> }
>>
>> + if (kfd_is_locked())
>> + return -EAGAIN;
>> +
>> process = kfd_create_process(filep);
>> if (IS_ERR(process))
>> return PTR_ERR(process);
>>
>> - if (kfd_is_locked())
>> - return -EAGAIN;
>> -
>> dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>> process->pasid, process->is_32bit_user_mode);
>>
More information about the amd-gfx
mailing list