[systemd-devel] systemd freezes after rshd execution, if network connection is down

Umut Tezduyar Lindskog umut at tezduyar.com
Tue May 13 10:40:53 PDT 2014


It is also reproducible by just loosing the carrier on the link. Maybe new
async close is a candidate to solve it.

On Tuesday, April 29, 2014, Harald Hoyer <harald.hoyer at gmail.com> wrote:

> Am 28.04.2014 13:33, schrieb Jimmy Assarsson:
> > Hi,
> >
> > We stumbled upon a freeze/block in systemd.
> > The problem occurs when a rshd (socket activated) execution is
> completed, the network connection is down and systemd is closing the socket.
> > This causes a long (60 seconds) freeze where it's not possible to
> communicate with systemd.
> > Do you have any idea on what is causing this or how we can investigate
> this further?
> >
> >
> > To reproduce the problem:
> > 1) Get latest Arch Linux
> > 2) On remote machine execute
> >    rsh $target_ip -l root 'sleep 40'
> > 3) Set link down on the interface which is assigned with $target_ip, on
> systemd machine
> >    ip link set down dev $if
> > 4) On systemd machine, wait for 'sleep 40' to be completed. Then execute
> any systemd command
> >    systemctl list-jobs
> > 5) After 60 seconds systemd is responding again
> >
> >
> > By looking at the stack trace (see bellow), one can see that we are
> trying to close a socket and waiting on a system close call. So it's
> probably not a systemd problem, however systemd is affected by it.
> >
> > We've succesfully reproduced the problem on different hardware
> architectures (x86_64, arm, cris), systemd versions (208, 210, 212) and
> rshd implementations (netkit-rsh-0.17, inetutils 1.9.2-1). The problem
> occurs not only when the interface's link is set down, also when the IP
> address is removed or the ethernet cable is unplugged. ssh seems not to be
> affected by the problem.
> >
> >
> > We generated a core dump:
> > kill -SIGABRT 1
> >
> > Here is the stack trace (the machine is running systemd 210).
> > (gdb) bt
> > #0  0xb6f4d830 in raise (sig=sig at entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:46
> > #1  0x000527e8 in crash.4282 (sig=6) at
> apps/systemd/systemd/src/core/main.c:156
> > #2  <signal handler called>
> > #3  0xb6f4c28c in close () from
> target/armv6-axis-linux-gnueabi/lib/libpthread.so.0
> > #4  0x0009417c in close_nointr (fd=<optimized out>) at
> apps/systemd/systemd/src/shared/util.c:167
> > #5  0x00094250 in close_nointr_nofail (fd=<optimized out>) at
> apps/systemd/systemd/src/shared/util.c:191
> > #6  0x00073e0c in service_close_socket_fd.9824 (s=s at entry=0x1b6f918) at
> apps/systemd/systemd/src/core/service.c:229
> > #7  0x00079728 in service_set_state.9835 (s=s at entry=0x1b6f918,
> state=SERVICE_DEAD) at apps/systemd/systemd/src/core/service.c:1496
> > #8  0x00079b70 in service_enter_dead.9847 (s=0x1b6f918, f=<optimized
> out>, allow_restart=<optimized out>)
> >     at apps/systemd/systemd/src/core/service.c:1852
> > #9  0x00065470 in service_sigchld_event (u=0x1b6f918, pid=<optimized
> out>, code=1, status=0)
> >     at apps/systemd/systemd/src/core/service.c:3037
> > #10 0x00073490 in invoke_sigchld_event.5410 (m=m at entry=0x1ad7360,
> u=0x1b6f918, si=0xbe862670, si at entry=0xbe862668)
> >     at apps/systemd/systemd/src/core/manager.c:1430
> > #11 0x00054084 in manager_dispatch_sigchld.5415 (m=m at entry=0x1ad7360)
> at apps/systemd/systemd/src/core/manager.c:1477
> > #12 0x000629b0 in manager_dispatch_signal_fd.part.32
> (userdata=<optimized out>) at apps/systemd/systemd/src/core/manager.c:1723
> > #13 manager_dispatch_signal_fd.5363 (source=<optimized out>,
> fd=<optimized out>, revents=<optimized out>, userdata=0x1ad7360)
> >     at apps/systemd/systemd/src/core/manager.c:1508
> > #14 0x0003e880 in source_dispatch (s=0x1ad7758) at
> apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:1861
> > #15 0x00041288 in sd_event_run (e=0x1ad61d8, timeout=<optimized out>) at
> apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:2117
> > #16 0x000103c8 in manager_loop (m=0x1ad7360) at
> apps/systemd/systemd/src/core/manager.c:1844
> > #17 main (argc=1, argv=0xbe862ee4) at
> apps/systemd/systemd/src/core/main.c:1704
> >
> > Thanks,
> > Jimmy
>
> Hmm, reminds me of:
>
>
> http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required
>
> http://oroboro.com/dealing-with-network-port-abuse-in-sockets-in-c/
>
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org <javascript:;>
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20140513/045bafbe/attachment.html>


More information about the systemd-devel mailing list