Infinite connection loop

Aleksander Morgado aleksander at aleksander.es
Fri Oct 26 08:23:57 UTC 2018


Hey,

> After looking into this a lot more, I’m still not closer to a solution or a root cause. For a while, I thought enabling debug logging for pppd (via the NM_PPP_DEBUG environment variable for NetworkManager) resolved the issue. However, it didn’t reliably. For the debug and non-debug case, I captured straces. In this particular instance, the debug-case didn’t show any problems re-establishing a connection after the connection and pppd was terminated while the non-debug case did. I have attached both straces. When looking at the straces, a couple of things seemed odd to me:
>
> 1. After the modem hangs up, the TTY is in a state where ioctls to the TTY return I/O errors. How is pppd supposed restore the termios settings when it’s hung up?

This is the CLOCAL set/unset that I referred to in the previous email.

> 2. NetworkManager doesn’t wait to signal pppd with SIGTERM until pppd is in PHASE_DEAD, even though this should be happening. Ref https://github.com/paulusmack/ppp/issues/6#issuecomment-51176255

I believe I fixed that in NM a while back.

> 3. The termios flags present when pppd starts up differ for the debug/non-debug cases: c_iflags=0x5 vs 0x4 and c_lflags=0 vs 0x8a21. Why would that be the case? Doesn’t ModemManager set this up deterministically? Does it matter?
>

You mean with MM in debug mode or not in debug mode? That's surprising.

> Besides that, my current working theory is that the problem is somehow caused by the periodic connection status check. If that detects a lost connection (+CGACT: 1,0) before pppd has terminated, it will emit the MM_PORT_CONNECTED notification which calls back to port_connected(). There, we try to re-acquire exclusive access to the TTY which will fail because pppd is still running. Afterwards, data_watch_enable starts to watch the TTY which immediately triggers with the modem hangup.
>

Oh, that could definitely be related as well... if pppd is still
running and MM is the one detecting the context disconnection, we may
be trying to reconfigure the port with our input callbacks before pppd
has exited and left the port. Testing that would be easy, just disable
the periodic connection checks in mm-broadband-bearer.c:
   base_bearer_class->load_connection_status = NULL;
   base_bearer_class->load_connection_status_finish = NULL;

If this is the case, we may need to temporarily disable the connection
checks for the case where the data port is a TTY, and rely only on
NM/pppd detecting the disconnections, until we find a better way to
sync all this.

-- 
Aleksander
https://aleksander.es


More information about the ModemManager-devel mailing list