[systemd-devel] nsenter and SIGSTOP

Sun Apr 21 15:07:18 PDT 2013

Zbigniew Jędrzejewski-Szmek <zbyszek at in.waw.pl> writes:

> On Sun, Apr 21, 2013 at 09:18:34AM -0700, Eric W. Biederman wrote:
>> Zbigniew Jędrzejewski-Szmek <zbyszek at in.waw.pl> writes:
>> 
>> > On Sat, Apr 20, 2013 at 03:27:46PM -0700, Eric W. Biederman wrote:
>> >> Zbigniew Jędrzejewski-Szmek <zbyszek at in.waw.pl> writes:
>> >> 
>> >> > Hi,
>> >> > I've hit a bit of a problem with nsenter and systemd-nspawn.
>> >> > When nsenter is used to enter the PID namespace created with
>> >> > systemd-nspawn, and the container's init attempts a shutdown,
>> >> > it hangs because nsenter is suspended.
>> >> >
>> >> > The sequence of events leading to the hang is:
>> >> >
>> >> > 1. nsenter launches a shell inside the container with
>> >> >    PPID=0 as seen inside the container,
>> >> > 2. systemd with PID=1 goes through the shutdown sequence,
>> >> >    issuing the equivalent(*) of
>> >> >
>> >> >    kill(-1, SIGSTOP)
>> >> 
>> >> This baffles me.  I am not certain why someone whould send SIGSTOP
>> >> when the want processes to exit.  I'm not even saying it's wrong just
>> >> saying that is odd.
>> > Like Lennart wrote, it's for atomicity of the subsequent killing.
>> 
>> When you don't do kill(-1, SIGTERM) that makes sense.
> Because not all processes are killed: during normal shutdown processes
> with argv[0] beginning with @ are spared
> (http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons).

I wasn't arguing.  But since you bring up root storage daemons and
initial ram disks, I don't believe any of those actually apply to a
container.

My point was only that using SIGSTOP makes sense when it becomes clear
that kill(-1, SIGTERM) is not happening.

>> No.  However it is possible to get a notification when the child wakes
>> up, and even more when the child is killed (SIGCHLD).
> Right, but that doesn't help at all, since nsenter is sleeping. It'll
> get the notification, when it wakes up, but there's nothing to wake it
> up.

I believe you could make it all work with a poll based on a periodic
wake up.  If you don't send yourself SIGSTOP I don't believe the outer
shell will recognize what has happened, but you should be able to wake
yourself up with a timer and see verify you child is still stopped.

>> Well when this happens with ssh "reboot & exit" has a pretty good track
>> record of working.  'shutdown -r "now + 1 minute" &' might even be
>> better.
>> 
>> When you are interactive I don't imaginge going "doh!" and typing fg
>> is not going to be particularly hard either.
> We're trying to get things to work without kludges like sleeping or 
> manual prodding. For debugging that's fine, but people use systemd-nspawn
> containers for services, and expect them to "just work".

Like I said below nsenter is about interactive/debugging case.  More
sysadmin debugging than application debugging but I guess that makes
it debugging.

Essentially that was the point of implementing setns for the pid
namespace.  So you exceptional cases could be handled without having to
got to the expense of having to depend on a functioning login daemon in
the container.

>> So I guess I am saying I would bias nsenter towards the interactive users
>> rather than scripted automation.
> Agreed.
>
>> > For systemd-nspawn, we'll grow our own facility to enter the
>> > container, since we want to set the environment and find the container
>> > by name and in general integrate with systemd-nspawn. So there's
>> > little reason to modify nsenter for this purpose. 
>> 
>> Sounds reaasonable to me.  Just make certain multiple roots in the pid
>> namespace doing mess you up.
> Yeah, multiple roots with unkillable zombie processes surely are enough
> to make people confused. I'm still trying to wrap my head around PID
> and mount namespaces, and I know that user namespaces add another level
> of fun :).

Well this isn't a case of unkillable zombies.  This is a case of
processes not being reaped.  And a weird process that doesn't fully die
until all of it's children are dead (similar to the leader of a thread
group).  Now honestly I would not be adverse in theory to handling this
weird corrner case better in zap_pid_ns_processes.  Right now
zap_pid_ns_processes is the best so far.

For launching new services in a container simply sending a message to
the init process is probably what you want.  I think those messages
already traverse unix domain sockets so it insn't too shabby.

Eric