[PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Andrey.Grodzovsky at amd.com
Tue Apr 24 16:43:28 UTC 2018
On 04/24/2018 12:23 PM, Eric W. Biederman wrote:
> Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes:
>> Avoid calling wait_event_killable when you are possibly being called
>> from get_signal routine since in that case you end up in a deadlock
>> where you are alreay blocked in singla processing any trying to wait
>> on a new signal.
> I am curious what the call path that is problematic here.
Here is the problematic call stack
[<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
[<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
[<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
[<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
[<0>] drm_release+0x414/0x5b0 [drm]
On exit from system call you process all the signals you received and
encounter a fatal signal which triggers process termination.
> In general waiting seems wrong when the process has already been
> fatally killed as indicated by PF_SIGNALED.
So indeed this patch avoids wait in this case.
> Returning -ERESTARTSYS seems wrong as nothing should make it back even
> to the edge of userspace here.
Can you clarify please - what should be returned here instead ?
> Given that this is the only use of PF_SIGNALED outside of bsd process
> accounting I find this code very suspicious.
> It looks the code path that gets called during exit is buggy and needs
> to be sorted out.
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> index 088ff2b..09fd258 100644
>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>> * The client will not queue more IBs during this fini, consume existing
>> - * queued IBs or discard them on SIGKILL
>> + * queued IBs or discard them when in death signal state since
>> + * wait_event_killable can't receive signals in that state.
>> - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>> + if (current->flags & PF_SIGNALED)
>> entity->fini_status = -ERESTARTSYS;
>> entity->fini_status = wait_event_killable(sched->job_scheduled,
More information about the amd-gfx