[PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Tue Apr 24 16:43:28 UTC 2018



On 04/24/2018 12:23 PM, Eric W. Biederman wrote:
> Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes:
>
>> Avoid calling wait_event_killable when you are possibly being called
>> from get_signal routine since in that case you end up in a deadlock
>> where you are alreay blocked in singla processing any trying to wait
>> on a new signal.
> I am curious what the call path that is problematic here.

Here is the problematic call stack

[<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
[<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
[<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
[<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
[<0>] drm_release+0x414/0x5b0 [drm]
[<0>] __fput+0x176/0x350
[<0>] task_work_run+0xa1/0xc0
[<0>] do_exit+0x48f/0x1280
[<0>] do_group_exit+0x89/0x140
[<0>] get_signal+0x375/0x8f0
[<0>] do_signal+0x79/0xaa0
[<0>] exit_to_usermode_loop+0x83/0xd0
[<0>] do_syscall_64+0x244/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

On exit from system call you process all the signals you received and 
encounter a fatal signal which triggers process termination.

>
> In general waiting seems wrong when the process has already been
> fatally killed as indicated by PF_SIGNALED.

So indeed this patch avoids wait in this case.

>
> Returning -ERESTARTSYS seems wrong as nothing should make it back even
> to the edge of userspace here.

Can you clarify please - what should be returned here instead ?

Andrey

>
> Given that this is the only use of PF_SIGNALED outside of bsd process
> accounting I find this code very suspicious.
>
> It looks the code path that gets called during exit is buggy and needs
> to be sorted out.
>
> Eric
>
>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> index 088ff2b..09fd258 100644
>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>   		return;
>>   	/**
>>   	 * The client will not queue more IBs during this fini, consume existing
>> -	 * queued IBs or discard them on SIGKILL
>> +	 * queued IBs or discard them when in death signal state since
>> +	 * wait_event_killable can't receive signals in that state.
>>   	*/
>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>> +	if (current->flags & PF_SIGNALED)
>>   		entity->fini_status = -ERESTARTSYS;
>>   	else
>>   		entity->fini_status = wait_event_killable(sched->job_scheduled,



More information about the amd-gfx mailing list