[PATCH 1/3] signals: Allow generation of SIGKILL to exiting task.
Eric W. Biederman
ebiederm at xmission.com
Tue Apr 24 17:29:32 UTC 2018
Andrey Grodzovsky <Andrey.Grodzovsky at amd.com> writes:
> On 04/24/2018 12:42 PM, Eric W. Biederman wrote:
>> Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes:
>>
>>> Currently calling wait_event_killable as part of exiting process
>>> will stall forever since SIGKILL generation is suppresed by PF_EXITING.
>>>
>>> In our partilaur case AMDGPU driver wants to flush all GPU jobs in
>>> flight before shutting down. But if some job hangs the pipe we still want to
>>> be able to kill it and avoid a process in D state.
>> I should clarify. This absolutely can not be done.
>> PF_EXITING is set just before a task starts tearing down it's signal
>> handling.
>>
>> So delivering any signal, or otherwise depending on signal handling
>> after PF_EXITING is set can not be done. That abstraction is gone.
>
> I see, so you suggest it's the driver responsibility to avoid creating
> such code path that ends up
> calling wait_event_killable from exit call stack (PF_EXITING == 1) ?
I don't just suggest.
I am saying clearly that any dependency on receiving SIGKILL after
PF_EXITING is set is a bug.
It looks safe (the bitmap is not freed) to use wait_event_killable on a
dual use code path, but you can't expect SIGKILL ever to be delivered
during fop->release, as f_op->release is called from exit after signal
handling has been shutdown.
The best generic code could do would be to always have
fatal_signal_pending return true after PF_EXITING is set.
Increasingly I am thinking that drm_sched_entity_fini should have a
wait_event_timeout or no wait at all. The cleanup code should have
a progress guarantee of it's own.
Eric
More information about the amd-gfx
mailing list