[systemd-devel] SuccessExitStatus , user slice, SSH ?

Steve Traylen steve.traylen at cern.ch
Thu Oct 31 14:09:12 UTC 2024


On 31/10/2024 12:48, Lennart Poettering wrote:

> On Do, 31.10.24 10:20, Steve Traylen (steve.traylen at cern.ch) wrote:
>
>> Hi,
>>
>> I was trying to suppress user scope units that are considered failed due to
>> them requiring a SIGKILL. Typical log might be.
>>
>> Oct 30 10:27:55 node989.example.ch systemd[1]: session-3804.scope: Killing
>> process 1550946 (node) with signal SIGKILL.
>> Oct 30 10:29:25 node989.example.ch systemd[1]: session-3804.scope: Still
>> around after SIGKILL. Ignoring.
>> Oct 30 10:29:25 node989.example.ch systemd[1]: session-3804.scope: Failed
>> with result 'timeout'.
>> Oct 30 10:29:25 node989.example.ch systemd[1]: session-3804.scope: Consumed
>> 1min 30.745s CPU time.
>>
>> I doubt increasing the timeout will help. I had thought that a
>>
>> # /etc/systemd/system/user-.slice.d/ignore-timeout.conf
>>
>> [Slice]
>> SuccessExitStatus=SIGKILL
>>
>> might help but alas SuccessExitStatus can only be set on a services it
>> seems.
> SuccessExitStatus is really just about process exit statuses,
> i.e. about the waitid() info that the service manager will see for the
> main service process.
>
> In your case the scope fails due to the timeout, not because the
> service would exit due to a SIGKILL (after all, it *doesn*t exit even
> though the SIGKILL was sent).
>
> Note that a process that is unkillable even by SIGKILL usually
> indicates some driver/kernel bug. Unkillable processes are not the
> norm on Linux.
>
> I am not entirely sure what we are trying to do? You don't want to see the
> losg about the scope not going away? We don#t support suppressing such
> logs for scope units.
>
> Or are you concerned that these .scope units stick around? Well, they
> do that because they still contain a process. We do not support a
> mechanism to hide units that still have running processes, sorry. And
> I am pretty sure we should not support this.
>
> I'd recommend figuring out why these processes do not react to SIGKILL
> instead of hiding the issue.

Yes I did not read that log carefully enough - that one is super bad if 
SIGKILL does not work it should
not be hidden. What I thought I was asking about was this case.

Oct 30 12:32:30 node9107.example.ch systemd[1]: session-12336.scope: 
Stopping timed out. Killing.
Oct 30 12:32:30 node9107.example.ch systemd[1]: session-12336.scope: 
Killing process 2227059 (bash) with signal SIGKILL.
Oct 30 12:32:49 node9107.example.ch systemd[1]: session-12336.scope: 
Failed with result 'timeout'.

was the less bad version where I read it as the SIGTERM failed but the 
SIGKILL was successful.

In this less bad case is that still a Failed with result 'timeout' if 
SIGTERM "fails" and it relies on SIGKILL or is that the SIGKILL failing.

I was hoping not see the logs (as a failure) for the resorting to a 
successful SIGKILL.

Steve.




> Lennart
>
> --
> Lennart Poettering, Berlin


More information about the systemd-devel mailing list