[systemd-devel] systemd freezes after rshd execution, if network connection is down

Tue Apr 29 02:33:50 PDT 2014

Am 28.04.2014 13:33, schrieb Jimmy Assarsson:
> Hi,
> 
> We stumbled upon a freeze/block in systemd.
> The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket.
> This causes a long (60 seconds) freeze where it's not possible to communicate with systemd.
> Do you have any idea on what is causing this or how we can investigate this further?
> 
> 
> To reproduce the problem:
> 1) Get latest Arch Linux
> 2) On remote machine execute
>    rsh $target_ip -l root 'sleep 40'
> 3) Set link down on the interface which is assigned with $target_ip, on systemd machine
>    ip link set down dev $if
> 4) On systemd machine, wait for 'sleep 40' to be completed. Then execute any systemd command
>    systemctl list-jobs
> 5) After 60 seconds systemd is responding again
> 
> 
> By looking at the stack trace (see bellow), one can see that we are trying to close a socket and waiting on a system close call. So it's probably not a systemd problem, however systemd is affected by it.
> 
> We've succesfully reproduced the problem on different hardware architectures (x86_64, arm, cris), systemd versions (208, 210, 212) and rshd implementations (netkit-rsh-0.17, inetutils 1.9.2-1). The problem occurs not only when the interface's link is set down, also when the IP address is removed or the ethernet cable is unplugged. ssh seems not to be affected by the problem.
> 
> 
> We generated a core dump:
> kill -SIGABRT 1
> 
> Here is the stack trace (the machine is running systemd 210). 
> (gdb) bt
> #0  0xb6f4d830 in raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:46
> #1  0x000527e8 in crash.4282 (sig=6) at apps/systemd/systemd/src/core/main.c:156
> #2  <signal handler called>
> #3  0xb6f4c28c in close () from target/armv6-axis-linux-gnueabi/lib/libpthread.so.0
> #4  0x0009417c in close_nointr (fd=<optimized out>) at apps/systemd/systemd/src/shared/util.c:167
> #5  0x00094250 in close_nointr_nofail (fd=<optimized out>) at apps/systemd/systemd/src/shared/util.c:191
> #6  0x00073e0c in service_close_socket_fd.9824 (s=s at entry=0x1b6f918) at apps/systemd/systemd/src/core/service.c:229
> #7  0x00079728 in service_set_state.9835 (s=s at entry=0x1b6f918, state=SERVICE_DEAD) at apps/systemd/systemd/src/core/service.c:1496
> #8  0x00079b70 in service_enter_dead.9847 (s=0x1b6f918, f=<optimized out>, allow_restart=<optimized out>)
>     at apps/systemd/systemd/src/core/service.c:1852
> #9  0x00065470 in service_sigchld_event (u=0x1b6f918, pid=<optimized out>, code=1, status=0)
>     at apps/systemd/systemd/src/core/service.c:3037
> #10 0x00073490 in invoke_sigchld_event.5410 (m=m at entry=0x1ad7360, u=0x1b6f918, si=0xbe862670, si at entry=0xbe862668)
>     at apps/systemd/systemd/src/core/manager.c:1430
> #11 0x00054084 in manager_dispatch_sigchld.5415 (m=m at entry=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1477
> #12 0x000629b0 in manager_dispatch_signal_fd.part.32 (userdata=<optimized out>) at apps/systemd/systemd/src/core/manager.c:1723
> #13 manager_dispatch_signal_fd.5363 (source=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=0x1ad7360)
>     at apps/systemd/systemd/src/core/manager.c:1508
> #14 0x0003e880 in source_dispatch (s=0x1ad7758) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:1861
> #15 0x00041288 in sd_event_run (e=0x1ad61d8, timeout=<optimized out>) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:2117
> #16 0x000103c8 in manager_loop (m=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1844
> #17 main (argc=1, argv=0xbe862ee4) at apps/systemd/systemd/src/core/main.c:1704
> 
> Thanks,
> Jimmy

Hmm, reminds me of:

http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required

http://oroboro.com/dealing-with-network-port-abuse-in-sockets-in-c/