Infinite connection loop
Sven Schwermer
sven at svenschwermer.de
Fri Oct 26 08:10:47 UTC 2018
Hi,
After looking into this a lot more, I’m still not closer to a solution or a root cause. For a while, I thought enabling debug logging for pppd (via the NM_PPP_DEBUG environment variable for NetworkManager) resolved the issue. However, it didn’t reliably. For the debug and non-debug case, I captured straces. In this particular instance, the debug-case didn’t show any problems re-establishing a connection after the connection and pppd was terminated while the non-debug case did. I have attached both straces. When looking at the straces, a couple of things seemed odd to me:
1. After the modem hangs up, the TTY is in a state where ioctls to the TTY return I/O errors. How is pppd supposed restore the termios settings when it’s hung up?
2. NetworkManager doesn’t wait to signal pppd with SIGTERM until pppd is in PHASE_DEAD, even though this should be happening. Ref https://github.com/paulusmack/ppp/issues/6#issuecomment-51176255
3. The termios flags present when pppd starts up differ for the debug/non-debug cases: c_iflags=0x5 vs 0x4 and c_lflags=0 vs 0x8a21. Why would that be the case? Doesn’t ModemManager set this up deterministically? Does it matter?
Besides that, my current working theory is that the problem is somehow caused by the periodic connection status check. If that detects a lost connection (+CGACT: 1,0) before pppd has terminated, it will emit the MM_PORT_CONNECTED notification which calls back to port_connected(). There, we try to re-acquire exclusive access to the TTY which will fail because pppd is still running. Afterwards, data_watch_enable starts to watch the TTY which immediately triggers with the modem hangup.
Sven
> On 24 Oct 2018, at 16:46, Sven Schwermer <sven at svenschwermer.de> wrote:
>
> It looks like I celebrated prematurely. As soon as I disabled the debug-level logging for NetworkManager and ModemManager, the issue started to appear again. This tells me that there is some sort of race condition. I’ll look into this more but I’d appreciate if one of the maintainers could explain the desired/expected series of events when the peer hangs up and pppd terminates.
>
> Thank you!
>
> Sven
>
>> On 23 Oct 2018, at 17:18, Sven Schwermer <sven at svenschwermer.de> wrote:
>>
>> Hi again,
>>
>> I think, I have found a fix for this. With the following two patches applied to NetworkManager 1.12.2, the issue seems to be gone (final confirmation pending):
>>
>> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/1913a4d2590d42413b04c0cf86321286bb2e2dbd
>> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/3392c439582b4ba5da19423fbaeed5843aeae9a9
>>
>> I suspect, the latter made the difference because pppd basically hangs when the bus method call doesn’t return.
>>
>> Sven
>>
>>> On 23 Oct 2018, at 11:37, Sven Schwermer <sven at svenschwermer.de> wrote:
>>>
>>> Hi Aleksander, Matthew,
>>>
>>> I looked into this and the issue that Aleksander referred to in the other thread (https://bugzilla.gnome.org/show_bug.cgi?id=734347). In the issue, Aleksander mentions that MM needs to be informed about the broken connection after pppd has terminated so that the CLOCAL setting is restored when MM interacts with the TTY again. However, if I interpret my logs correctly, MM detects itself that the connection is broken based on the modem’s response to AT+CGACT? (+CGACT: 1,0). This then triggers the notification signal MM_PORT_CONNECTED (with connected=false) which makes port_connected try to acquire the serial port lock which then fails.
>>>
>>> Sven
>>>
>>>> On 22 Oct 2018, at 16:52, Matthew Starr <mstarr at hedonline.com> wrote:
>>>>
>>>> Aleksander,
>>>>
>>>> This is the exact same issue I reported a couple months ago:
>>>> https://lists.freedesktop.org/archives/modemmanager-devel/2018-August/006593.html
>>>>
>>>> I am currently working around it by adding a NetworkManager dispatcher script that on ppp0 going down, I restart ModemManager.
>>>>
>>>> I never did find a solution to the issue even after trying some modifications to the serial force close code (src/mm-port-serial.c) in ModemManager. It looks like the force_closed boolean gets set and is never reset, which causes the serial port to never retry communications over the serial port again.
>>>>
>>>> Regards,
>>>> Matthew Starr
>>>>
>>>>> Date: Mon, 22 Oct 2018 11:39:37 +0200
>>>>> From: Sven Schwermer <sven at svenschwermer.de>
>>>>> To: Aleksander Morgado <aleksander at aleksander.es>
>>>>> Cc: modemmanager-devel at lists.freedesktop.org
>>>>> Subject: Re: Infinite connection loop
>>>>> Message-ID: <41689DBE-1755-4F85-BFA7-
>>>>> 97CF6E588F41 at svenschwermer.de>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>
>>>>> I looked a little closer into this now.
>>>>>
>>>>> Here’s what I think is happening:
>>>>> 1. My cell operator terminates the LCP after some time of inactivity. This is
>>>>> confirmed by my operator.
>>>>> 2. pppd receives the termination request, acknowledges it and the modem
>>>>> hangs up the connection.
>>>>> 3. ModemManager fails to re-acquire serial port lock because of the hangup.
>>>>> 4. ModemManager sets the connection state to ‘registered’.
>>>>> 5. ModemManager force-closes the serial port.
>>>>> 6. NetworkManager registers the lost connection and tries to re-establish it.
>>>>> 7. The ModemManager fails to re-open serial port because it had been force-
>>>>> closed.
>>>>>
>>>>> pppd doesn’t appear to be re-launched. How is this all supposed to work?
>>>>> I’m sure the software stack should be able to handle a terminated
>>>>> connection and re-dial it?
>>>>>
>>>>> Please find attached the debug-level log. The LCP termination happens at Oct
>>>>> 21 14:53:51.
>>>>>
>>>>> Best regards,
>>>>> Sven
>>>>>
>>>>> -------------- next part --------------
>>>>> A non-text attachment was scrubbed...
>>>>> Name: excerpt.long.gz
>>>>> Type: application/x-gzip
>>>>> Size: 53662 bytes
>>>>> Desc: not available
>>>>> URL: <https://lists.freedesktop.org/archives/modemmanager-
>>>>> devel/attachments/20181022/1e9e348d/attachment.bin>
>>>>> -------------- next part --------------
>>>>>
>>>>>
>>>>>> On 20 Oct 2018, at 16:26, Sven Schwermer <sven at svenschwermer.de>
>>>>> wrote:
>>>>>>
>>>>>> Hi again,
>>>>>>
>>>>>> I now managed to capture the event in the journal. I limited the auto
>>>>> connect retried to 4 in order to keep the log size manageable. I didn’t have
>>>>> debug-level logging active for NetworkManager at the time. I’ll enable that
>>>>> now and try to reproduce. Please find the log attached. The error seems to
>>>>> be happening at Oct 20 07:44:26.
>>>>>>
>>>>>> Best regards,
>>>>>> Sven
>>>>>>
>>>>>> <ed6e56cf2ac5b6bdb2e3ca97276ebd6d.journal.log>
>>>>>>
>>>>>>> On 20 Oct 2018, at 13:47, Aleksander Morgado
>>>>> <aleksander at aleksander.es> wrote:
>>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>>> I am running an embedded system with a u-blox SARA-U270 modem and
>>>>> NetworkManager 1.12.2 / ModemManager 1.8.0. For some reason, the
>>>>> modem connection seems to break over time. I am not quite sure why that’s
>>>>> the case, but when I look into the journal, I see NetworkManager and
>>>>> ModemManager spewing out log messages at a very high rate. This actually
>>>>> causes significant system load (mostly by dbus-daemon). Here’s an excerpt
>>>>> of the messages:
>>>>>>>>
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (4/8): Wait to get fully enabled
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (5/8): Register
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (6/8): Bearer
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (4/8): Wait to get fully enabled
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (5/8): Register
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (6/8): Bearer
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>>> [1539780869.7776] modem-broadband[ttyACM0]: failed to connect modem:
>>>>> Could not open serial device ttyACM0: it has been forced close
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.7778] device (ttyACM0): state change: prepare -> failed (reason
>>>>> 'unknown', sys-iface-state: 'managed')
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.7807] manager: NetworkManager state is now
>>>>> CONNECTED_LOCAL
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>>> [1539780869.7840] device (ttyACM0): Activation: failed for connection
>>>>> 'gprsconnection'
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.7860] device (ttyACM0): state change: failed -> disconnected
>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.7923] policy: auto-activating connection 'gprsconnection'
>>>>> (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.8005] device (ttyACM0): Activation: starting connection
>>>>> 'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.8015] device (ttyACM0): state change: disconnected -> prepare
>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.8043] manager: NetworkManager state is now CONNECTING
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (4/8): Wait to get fully enabled
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (5/8): Register
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (6/8): Bearer
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (4/8): Wait to get fully enabled
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (5/8): Register
>>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>>> (6/8): Bearer
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>>> [1539780869.9321] modem-broadband[ttyACM0]: failed to connect modem:
>>>>> Could not open serial device ttyACM0: it has been forced close
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9323] device (ttyACM0): state change: prepare -> failed (reason
>>>>> 'unknown', sys-iface-state: 'managed')
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9350] manager: NetworkManager state is now
>>>>> CONNECTED_LOCAL
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>>> [1539780869.9399] device (ttyACM0): Activation: failed for connection
>>>>> 'gprsconnection'
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9430] device (ttyACM0): state change: failed -> disconnected
>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9490] policy: auto-activating connection 'gprsconnection'
>>>>> (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9569] device (ttyACM0): Activation: starting connection
>>>>> 'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9585] device (ttyACM0): state change: disconnected -> prepare
>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>>> [1539780869.9653] manager: NetworkManager state is now CONNECTING
>>>>>>>>
>>>>>>>> It continues like that forever. My modem network connection config is
>>>>> as follows:
>>>>>>>>
>>>>>>>> [connection]
>>>>>>>> id=gprsconnection
>>>>>>>> type=gsm
>>>>>>>> autoconnect=true
>>>>>>>> autoconnect-retries=0
>>>>>>>> [gsm]
>>>>>>>> number=*99#
>>>>>>>> [ipv4]
>>>>>>>> dns-search=
>>>>>>>> method=auto
>>>>>>>> may-fail=false
>>>>>>>>
>>>>>>>> I have been looking through the code and what I don’t understand is
>>>>> why the ttyACM0 would ever be forcefully closed. What could be an
>>>>> explanation for this?
>>>>>>>>
>>>>>>>> Any insight is appreciated.
>>>>>>>>
>>>>>>>
>>>>>>> Do you have a full log of when this issue happens? If you can
>>>>>>> reproduce this more or less frequently, could you setup both
>>>>>>> ModemManager and NetworkManager to dump debug logs and try to get
>>>>> it
>>>>>>> reproduced? This issue could be to a wrong synchronization between MM
>>>>>>> and NM/pppd, in particular related to the CLOCAL settings in each case
>>>>>>> (i.e. MM shouldn't get HUPs in the serial port unless for the cases
>>>>>>> where the port is going away).
>>>>>>>
>>>>>>> --
>>>>>>> Aleksander
>>>>>>> https://aleksander.es
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20181026/e95c61b6/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hhe0-pppd-strace-debug.gz
Type: application/x-gzip
Size: 10174 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20181026/e95c61b6/attachment-0002.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20181026/e95c61b6/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hhe0-pppd-strace-nodebug.gz
Type: application/x-gzip
Size: 9478 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20181026/e95c61b6/attachment-0003.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20181026/e95c61b6/attachment-0005.html>
More information about the ModemManager-devel
mailing list